Proteases

ABSTRACT

The invention provides a human cysteine proteases and polynucleotides which encode those proteases. The invention also provides expression vectors, host cells, antibodies, agonists, and antagonists, as well as methods for diagnosing, treating, or preventing disorders associated with aberrant expression of cysteine proteases.

TECHNICAL FIELD

This invention relates to nucleic acid and amino acid sequences ofproteases and to the use of these sequences in the diagnosis, treatment,and prevention of gastrointestinal, cardiovascular,autoimmune/inflammatory, cell proliferative, developmental, epithelial,neurological, and reproductive disorders, and in the assessment of theeffects of exogenous compounds on the expression of nucleic acid andamino acid sequences of proteases.

BACKGROUND OF THE INVENTION

Proteases cleave proteins and peptides at the peptide bond that formsthe backbone of the protein or peptide chain. Proteolysis is one of themost important and frequent enzymatic reactions that occurs both withinand outside of cells. Proteolysis is responsible for the activation andmaturation of nascent polypeptides, the degradation of misfolded anddamaged proteins, and the controlled turnover of peptides within thecell. Proteases participate in digestion, endocrine function, and tissueremodeling during embryonic development, wound healing, and normalgrowth. Proteases can play a role in regulatory processes by affectingthe half life of regulatory proteins. Proteases are involved in theetiology or progression of disease states such as inflammation,angiogenesis, tumor dispersion and metastasis, cardiovascular disease,neurological disease, and bacterial, parasitic, and viral infections.

Proteases can be categorized on the basis of where they cleave theirsubstrates. Exopeptidases, which include aminopeptidases, dipeptidylpeptidases, tripeptidases, carboxypeptidases, peptidyl-di-peptidases,dipeptidases, and omega peptidases, cleave residues at the termini oftheir substrates. Endopeptidases, including serine proteases, cysteineproteases, and metalloproteases, cleave at residues within the peptide.Four principal categories of mammalian proteases have been identifiedbased on active site structure, mechanaism of action, and overallthree-dimensional structure. (See Beynon, R. J. and J. S. Bond (1994)Proteolytic Enzymes: A Practical Approach, Oxford University Press, NewYork N.Y., pp. 1-5.)

Serin Proteases

The serine proteases (SPs) are a large, widespread family of proteolyticenzyes that include the digestive enzymes trysin and chymotrypsin,components of the complement and blood-clotting cascades, and enzymesthat control the degradation and turnover of macromolecules within thecell and in the extracellular matrix Most of the more than 20subfamilies can be grouped into six clans, each with a common ancestor.These six clans are hypothesized to have descended from at least fourevolutionarily distinct ancestors. SPs are named for the presence of aserine residue found in the active catalytic site of most families. Theactive site is defined by the catalytic triad, a set of conservedasparagine, histidine, and serine residues critical for catalysis. Theseresidues form a charge relay network that facilitates substrate binding.Other residues outside the active site form an oxyanion hole thatstabilizes the tetrahedral transition intermediate formed duringcatalysis. SPs have a wide range of substrates and can be subdividedinto subfamilies on the basis of their substrate specificity. The mainsubfamilies are named for the residue(s) after which they cleave:trypases (after arginine or lysine), aspases (after aspartate), chymases(after phenylalanine or leucine), metases (methionine), and serases(after serine) (Rawlings, N. D. and A. J. Barrett (1994) MethodsEnzymol. 244:19-61).

Most mammalian serine proteases are synthesized as zymogens, inactiveprecursors that are activated by proteolysis. For example, trypsinogenis converted to its active form, trypsin, by enteropeptidase.Enteropeptidase is an intestinal protease that removes an N-terminalfragment from trypsinogen. The remaining active fragment is trypsin,which in turn activates the precursors of the other pancreatic enzymes.Likewise, proteolysis of prothrombin, the precursor of thrombin,generates three separate polypeptide fragments. The N-ternninal fragmentis released while the other two fragments, which comprise activethrombin, remain associated throug disulfide bonds.

The two largest SP subfamilies are the chymotrypsin (S1) and subtilisin(S8) families. Some members of the chymotrypsin family contain twostructural domains unique to this family. Kringle domains aretriple-looped, disulfide cross-linked domains found in varying copynumber. Kringles are thought to play a role in binding mediators such asmembranes, other proteins or phospholipids, and in the regulation ofproteolytic activity (PROSITE PDOC00020). Apple domains are 90amino-acid repeated domains, each containing six conserved cysteines.Three disulfide bonds link the first and sixth, second and fifth, andthird and fourth cysteines (PROSITE PDOC00376). Apple domains areinvolved in protein-protein interactions. S1 family members includetrypsin, chymotrypsin, coagulation factors IX-XII, complement factors B,C, and D, granzymes, kallikrein, and tissue- and urokinase-plasminogenactivators. The subtilisin family has members found in the eubacteria,archaebacteria, eukaryotes, and viruses. Subtilisins include theproprotein-processing endopeptidases kexin and furin and the pituitaryprohormone convertases PC1, PC2, PC3, PC6, and PACE4 (Rawlings andBarrett, supra).

SPs have functions in many normal processes and some have beenimplicated in the etiology or treatment of disease. Enterokinase, theinitiator of intestinal digestion, is found in the intestinal brushborder, where it cleaves the acidic propeptide from trypsinogen to yieldactive trypsin (Kitamoto, Y. et al. (1994) Proc. Natl. Acad. Sci. USA91:7588-7592). Prolylcarboxypeptidase, a lysosomal serine peptidase thatcleaves peptides such as angiotensin II and III and [des-Arg9]bradykinin, shares sequence homology with members of both the serinecarboxypeptidase and prolylendopeptidase families (Tan, F. et al. (1993)J. Biol. Chem. 268:16631-16638). The protease neuropsin may influencesynapse formation and neuronal connectivity in the hippocampus inresponse to neural signaling (Chen, Z.-L. et al. (1995) J. Neurosci.15:5088-5097). Tissue plasminogen activator is useful for acutemanagement of stroke (Zivin, J. A. (1999) Neurology 53:14-19) andmyocardial infarction Ross, A. M. (1999) Clin. Cardiol. 22:165-171).Some receptors (PAR, for proteinase-activated receptor), highlyexpressed throughout the digestive tract, are activated by proteolyticcleavage of an extracellular domain. The major agonists for PARs,thrombin, trypsin, and mast cell tryptase, are released in allergy andinflammatory conditions. Control of PAR activation by proteases has beensuggested as a promising therapeutic target (Vergnolle, N. (2000)Aliment. Pharmacol. Ther. 14:257-266; Rice, K. D. et al. (1998) Curr.Pharm. Des. 4:381-396). Prostate-specific antigen (PSA) is akallikrein-like serine protease synthesized and secreted exclusively byepithelial cells in the prostate gland. Serum PSA is elevated inprostate cancer and is the most sensitive physiological marker formonitoring cancer progression and response to therapy. PSA can alsoidentify the prostate as the origin of a metastatic tumor (Brawer, M. K.and P. H. Lange (1989) Urology 33:11-16).

The signal peptidase is a specialized class of SP found in allprokaryotic and eukaryotic cell types that serves in the processing ofsignal peptides from certain proteins. Signal peptides areamino-terminal domains of a protein which direct the protein from itsribosomal assembly site to a particular cellular or extracellularlocation. Once the protein has been exported, removal of the signalsequence by a signal peptidase and posttranslational processing, e.g.,glycosylation or phosphorylation, activate the protein. Signalpeptidases exist as multi-subunit complexes in both yeast and mammals.The canine signal peptidase complex is composed of five subunits, allassociated with the microsomal membrane and containing hydrophobicregions that span the membrane one or more times (Shelness, G. S. and G.Blobel (1990) J. Biol. Chem. 265:9512-9519). Some of these subunitsserve to fix the complex in its proper position on the membrane whileothers contain the actual catalytic activity.

Another family of proteases which have a serine in their active site aredependent on the hydrolysis of ATP for their activity. These proteasescontain proteolytic core domains and regulatory ATPase domains which canbe identified by the presence of the P-loop, an ATP/GTP-binding motif(PROSITE POC00803). Members of this family include the eukaryoticmitochondrial matrix proteases, Clp protease and the proteasome. Clpprotease was originally found in plant chloroplasts but is believed tobe widespread in both prokaryotic and eukaryotic cells. The gene forearly-onset torsion dystonia encodes a protein related to Clp protease(Ozelius, L. J. et al. (1998) Adv. Neurol. 78:93-105).

The proteasome is an intracellular protease complex found in somebacteria and in all eukaryotic cells, and plays an important role incellular physiology. Proteasomes are associated with the ubiquitinconjugation system (UCS), a major pathway for the degradation ofcellular proteins of all types, including proteins that function toactivate or repress cellular processes such as transcription and cellcycle progression (Ciechanover, A. (1994) Cell 79:13-21). In the UCSpathway, proteins targeted for degradation are conjugated to ubiquitin,a small heat stable piotein. The ubiquitinated protein is thenrecognized and degraded by the proteasome. The resultantubiquitin-peptide complex is hydrolyzed by a ubiquitin carboxyl terminalhydrolase, and free ubiquitin is released for reutilization by the UCS.Ubiquitin-proteasome systems are implicated in the degradation ofmitotic cyclic kinases, oncoproteins, tumor suppressor genes (p53), cellsurface receptors associated with signal transduction, transcriptionalregulators, and mutated or damaged proteins (Ciechanover, supra). Thispathway has been implicated in a number of diseases, including cysticfibrosis, Angelman's syndrome, and Liddle syndrome (reviewed inSchwartz, A. L. and A. Ciechanover (1999) Annu. Rev. Med. 50:57-74). Amurine proto-oncogene, Unp, encodes a nuclear ubiquitin protease whoseoverexpression leads to oncogenic transformation of NIH3T3 cells. Thehuman homologue of this gene is consistently elevated in small celltumors and adenocarcinomas of the lung (Gray, D. A. (1995) Oncogene10:2179-2183). Ubiquitin carboxyl terminal hydrolase is involved in thedifferentiation of a lymphoblastic leukemia cell line to a non-dividingmature state (Maki, A. et al. (1996) Differentiation 60:59-66). Inneurons, ubiquitin carboxyl terminal hydrolase (PGP 9.5) expression isstrong in the abnormal structures that occur inhuman neurodegenerativediseases (Lowe, J. et al. (1990) J. Pathol. 161:153-160). The proteasomeis a large (˜2000 kDa) multisubunit complex composed of a centralcatalytic core containing a variety of proteases arranged in fourseven-membered rings with the active sites facing inwards into thecentral cavity, and terminal ATPase subunits covering the outer port ofthe cavity and regulating substrate entry (for review, see Schmidt, M.et al. (1999) Curr. Opin. Chem. Biol. 3:584-591).

Cysteine Proteases

Cysteine proteases (CPs) are involved in diverse cellular processesranging from the processing of precursor proteins to intracellulardegradation Nearly half of the CPs known are present only in viruses.CPs have a cysteine as the major catalytic residue at the active sitewhere catalysis proceeds via a thioester intermediate and is facilitatedby nearby histidine and asparagine residues. A glutamine residue is alsoimportant, as it helps to form an oxyanion hole. Two important CPfamilies include the papain-like enzymes (C1) and the calpains (C2).Papain-like family members are generally lysosomal or secreted andtherefore are synthesized with signal peptides as well as propeptides.Most members bear a conserved motif in the propeptide that may havestructural significance (Karrer, K. M. et al. (1993) Proc. Natl. Acad.Sci. USA 90:3063-3067). Three-dimensional structures of papain familymembers show a bilobed molecule with the catalytic site located betweenthe two lobes. Papains include cathepsins B, C, H, L, and S, certainplant allergens and dipeptidyl peptidase (for a review, see Rawlings, N.D. and A. J. Barrett (1994) Methods Enzymol. 244:461-486).

Some CPs are expressed ubiquitously, while others are produced only bycells of the immune system. Of particular note, CPs are produced bymonocytes, macrophages and other cells which migrate to sites ofinflammation and secrete molecules involved in tissue repair.Overabundance of these repair molecules plays a role in certaindisorders. In autoimmune diseases such as rheumatoid arthritis,secretion of the cysteine peptidase cathepsin C degrades collagen,laminin, elastin and other structural proteins found in theextracellular matrix of bones. Bone weakened by such degradation is alsomore susceptible to tumor invasion and metastasis. Cathepsin Lexpression may also contribute to the influx of mononuclear cells whichexacerbates the destruction of the rheumatoid synovium (Keyszer, G. M.(1995) Arthritis Rheum. 38:976-984).

Calpains are calcium-dependent cytosolic endopeptidases which containboth an N-terminal catalytic domain and a C-terminal calcium-bindingdomain. Calpain is expressed as a proenzyme heterodimer consisting of acatalytic subunit unique to each isoform and a regulatory subunit commonto different isoforms. Each subunit bears a calcium-binding EF-handdomain. The regulatory subunit also contains a hydrophobic glycine-richdomain that allows the enzyme to associate with cell membranes. Calpainsare activated by increased intracellular calcium concentration, whichinduces a change in conformation and limited autolysis. The resultantactive molecule requires a lower calcium concentration for its activity(Chan, S. L. and M. P. Mattson (1999) J. Neurosci. Res. 58:167-190).Calpain expression is predominantly neuronal, although it is present inother tissues. Several chronic neurodegenerative disorders, includingALS, Parkinson's disease and Alzheimer's disease are associated withincreased calpain expression (Chan and Mattson, supra). Calpain-mediatedbreakdown of the cytoskeleton has been proposed to contribute to braindamage resulting from head injury (McCracken, E. et al. (1999) J.Neurotrauma 16:749-761). Calpain-3 is predominantly expressed inskeletal muscle, and is responsible for limb-girdle muscular dystrophytype 2A (Minami, N. et al. (1999) J. Neurol. Sci. 171:31-37).

Another family of thiol proteases is the caspases, which are involved inthe initiation and execution phases of apoptosis. A pro-apoptotic signalcan activate initiator caspases that trigger a proteolytic caspasecascade, leading to the hydrolysis of target proteins and the classicapoptotic death of the cell. Two active site residues, a cysteine and ahistidine, have been implicated in the catalytic mechanism. Caspases areamong the most specific endopeptidases, cleaving after aspartateresidues. Caspases are synthesized as inactive zymogens consisting ofone large (p20) and one small (p10) subunit separated by a small spacerregion, and a variable N-terminal prodomain. This prodomain interactswith cofactors that can positively or negatively affect apoptosis. Anactivating signal causes autoproteolytic cleavage of a specificaspartate residue (D297 in the caspase-1 numbering convention) andremoval of the spacer and prodomain, leaving a p10/p20 heterodimer. Twoof these heterodimers interact via their small subunits to form thecatalytically active tetramer. The long prodomains of some caspasefamily members have been shown to promote dimerization andauto-processing of procaspases. Some caspases contain a “death effectordomain” in their prodomain by which they can be recruited intoself-activating complexes with other caspases and FADD proteinassociated death receptors or the TNF receptor complex. In addition, twodimers from different caspase family members can associate, changing thesubstrate specificity of the resultant tetramer. Endogenous caspaseinhibitors (inhibitor of apoptosis proteins, or IAPs) also exist. Allthese interactions have clear effects on the control of apoptosis(reviewed in Chan and Mattson, supra; Salveson, G. S. and V. M. Dixit(1999) Proc. Natl. Acad. Sci. USA 96:10964-10967).

Caspases have been implicated in a number of diseases. Mice lacking somecaspases have severe nervous system defects due to failed apoptosis inthe neuroepithelium and suffer early lethality. Others show severedefects in the inflammatory response, as caspases are responsible forprocessing IL-1b and possibly other inflammatory cytoldnes (Chan andMattson, supra). Cowpox virus and baculoviruses target caspases to avoidthe death of their host cell and promote successful infection. Inaddition, increases in inappropriate apoptosis have been reported inAIDS, neurodegenerative diseases and ischemic injury, while a decreasein cell death is associated with cancer (Salveson and Dixit, supra;Thompson, C. B. (1995) Science 267:1456-1462).

Aspartyl Proteases

Aspartyl proteases (APs) include the lysosomal proteases cathepsins Dand E, as well as chymosin, renin, and the gastric pepsins. Mostretroviruses encode an AP, usually as part of the pol polyprotein. APs,also called acid proteases, are monomeric enzymes consisting of twodomains, each domain containing one half of the active site with its owncatalytic aspartic acid residue. APs are most active in the range of pH2-3, at which one of the aspartate residues is ionized and the otherneutral. The pepsin family of APs contains many secreted enzymes, andall are likely to be synthesized with signal peptides and propeptides.Most family members have three disulfide loops, the first ˜5 residueloop following the first aspartate, the second 5-6 residue looppreceding the second aspartate, and the third and largest loop occuringtoward the C terminus. Retropepsins, on the other hand, are analogous toa single domain of pepsin, and become active as homodimers with eachretropepsin monomer contributing one half of the active site.Retropepsins are required for processing the viral polyproteins.

APs have roles in various tissues, and some have been associated withdisease. Renin mediates the first step in processing the hormoneangiotensin, which is responsible for regulating electrolyte balance andblood pressure (reviewed in Crews, D. E. and S. R. Williams (1999) Hum.Biol. 71:475-503). Abnormal regulation and expression of cathepsins areevident in various inflanmatory disease states. Expression of cathepsinD is elevated in synovial tissues from patients with rheumatoidarthritis and osteoarthritis. The increased expression and differentialregulation of the cathepsins are linked to the metastatic potential of avariety of cancers (Chambers, A. F. et al. (1993) Crit. Rev. Oncol.4:95-114).

Metalloproteases

Metalloproteases require a metal ion for activity, usually manganese orzinc. Examples of manganese metalloenzymes include aminopeptidase P andhuman proline dipeptidase (PEPD). Aminopeptidase P can degradebradykinin, a nonapeptide activated in a variety of inflammatoryresponses. Aminopeptidase P has been implicated in coronaryischemia/reperfasion injury. Administration of aminopeptidase Pinhibitors has been shown to have a cardioprotective effect in rats(Ersahin, C. et al. (1999) J. Cardiovasc. Pharmacol 34:604-611).

Most zinc-dependent metalloproteases share a common sequence in thezinc-binding domain. The active site is made up of two histidines whichact as zinc ligands and a catalytic glutamic acid C-terminal to thefirst histidine. Proteins containing this signature sequence are knownas the metzincins and include aminopeptidase N, angiotensin-convertingenzyme, neurolysin, the matrix metalloproteases and the adamalysins(ADAMS). An alternate sequence is found in the zinc carboxypeptidases,in which all three conserved residues—two histidines and a glutamicacid—are involved in zinc binding.

A number of the neutral metalloendopeptidases, including angiotensinconverting enzyme and the aminopeptidases, are involved in themetabolism of peptide hormones. High atninopeptidase B activity, forexample, is found in the adrenal glands and neurohypophyses ofhypertensive rats (Prieto, I. et al. (1998) Horm. Metab. Res.30:246-248). Oligopeptidase M/neurolysin can hydrolyze bradykin as wellas neurotensin (Serizawa, A. et al. (1995) J. Biol. Chem.270:2092-2098). Neurotensin is a varoactive peptide that can act as aneurotransmitter in the brain, where it has been implicated in limitingfood intake (Tritos, N. A. et al. (1999) Neuropeptides 33:339-349).

The matrix metalloproteases (MMPs) are a family of at least 23 enzymesthat can degrade components of the extracellular matrix (ECM). They areZn⁺² endopeptidases with an N-terminal catalytic domain. Nearly allmembers of the family have a hinge peptide and C-terminal domain whichcan bind to substrate molecules in the ECM or to inhibitors produced bythe tissue (TIMPs, for tissue inhibitor of metalloprotease; Campbell, I.L. et al. (1999) Trends Neurosci. 22.285). The presence offibronectin-like repeats, transmembrane domains, or C-terminalhemopexinase-like domains can be used to separate MMPs into collagenase,gelatinase, stromelysin and membrane-type MMP subfamilies. In theinactive form, the Zn⁺² ion in the active site interacts with a cysteinein the pro-sequence. Activating factors disrupt the Zn⁺²-cysteineinteraction, or “cysteine switch,” exposing the active site. Thispartially activates the enzyme, which then cleaves off its propeptideand becomes fully active. MMPs are often activated by the serineproteases plasmin and furin. MMPs are often regulated by stoichiometric,noncovalent interactions with inhibitors; the balance of protease toinhibitor, then, is very important in tissue homeostasis (reviewed inYong, V. W. et al. (1998) Trends Neurosci. 21:75).

MMPs are implicated in a number of diseases including osteoarthritis(Mitchell, P. et al. (1996) J. Clin. Invest. 97:761), atheroscleroticplaque rupture (Sukhova, G. K. et al. (1999) Circulation 99:2503),aortic aneurysm (Schneiderman, J. et al. (1998) Am. J. Path. 152:703),non-healing wounds (Saarialho-Kere, U. K. et al. (1994) J. Clin. Invest.94:79), bone resorption (Blavier, L. and J. M. Delaisse (1995) J. CellSci. 108:3649), age-related macular degeneration (Steen, B. et al.(1998) Invest. Ophthalmol. Vis. Sci. 39:2194), emphysema Finlay, G. A.et al. (1997) Thorax 52:502), myocardial infarction (Rohde, L. E. et al.(1999) Circulation 99:3063) and dilated cardiomyopathy (Thomas, C. V. etal. (1998) Circulation 97:1708). MMP inibitors prevent metastasis ofmammary carcinoma and experimental tumors in rat, and Lewis lungcarcinoma, hemangioma, and human ovarian carcinoma xenografts in mice(Eccles, S. A. et al. (1996) Cancer Res. 56:2815; Anderson et al. (1996)Cancer Res. 56:715-718; Volpert, O. V. et al. (1996) J. Clin. Invest.98:671; Taraboletti, G. et al. (1995) J. NCI 87:293; Davies, B. et al.(1993) Cancer Res. 53:2087). MMPs may be active in Alzheimer's disease.A number of MMPs are implicated in multiple sclerosis, andadministration of MMP inhbitors can relieve some of its symptoms(reviewed in Yong, supra).

Another family of metalloproteases is the ADAMs, for A Disintegrin andMetalloprotease Domain, which they share with their close relatives theadamalysins, snake venom metalloproteases (SVMPs). ADAMs combinefeatures of both cell surface adhesion molecules and proteases,containing a prodomain, a protease domain, a disintegrin domain, acysteine rich domain, an epidermal growth factor repeat, a transmembranedomain, and a cytoplasmic tail. The first three domains listed above arealso found in the SVPs. The ADAMs possess four potential functions:proteolysis, adhesion, signaling and fusion. The ADAMs share themetzincin zinc binding sequence and are inhibited by some MMPantagonists such as TIMP-1.

ADAMs are implicated in such processes as sperm-egg binding and fusion,myoblast fusion, and protein-ectodomain processing or shedding ofcytokines, cytokine receptors, adhesion proteins and other extracellularprotein domains (Schlöndorff, J. and C. P. Blobel (1999) J. Cell. Sci.112:3603-3617). The Kuzbanian protein cleaves a substrate in the NOTCHpathway (possibly NOTCH itself), activating the program for lateralinhibition in Drosophila neural development. Two ADAMs, TACE (ADAM 17)and ADAM 10, are proposed to have analogous roles in the processing ofamyloid precursor protein in the brain (Schlöndorff and Blobel, supra).TACE has also been identified as the TNF activating enzyme (Black, R. A.et al. (1997) Nature 385:729). TNF is a pleiotropic cytokine that isimportant in mobilizing host defenses in response to infection ortrauma, but can cause severe damage in excess and is often overproducedin autoimmune disease. TACE cleaves membrane-bound pro-TNF to release asoluble form. Other ADAMs may be involved in a similar type ofprocessing of other membrane-bound molecules.

The ADAMTS sub-family has all of the features of ADAM familymetalloproteases and contain an additional thrombospondin domain (TS).The prototypic ADAMTS was identified in mouse, found to be expressed inheart and kidney and upregulated by proinflammatory stimuli (Kuno, K etal. (1997) J. Biol. Chem. 272:556-562). To date eleven members arerecognized by the Human Genome Organization (HUGO;http://www.gene.ucl.ac.uk/users/hester/adamts.html#Approved). Members ofthis family have the ability to degrade aggrecan, a high molecularweight proteoglycan which provides cartilage with important mechanicalproperties including compressibility, and which is lost during thedevelopment of arthritis. Enzymes which degrade aggrecan are thusconsidered attractive targets to prevent and slow the degradation ofarticular cartilage (See, e.g., Tortorella, M. D. (1999) Science284:1664; Abbaszade, I. (1999) J. Biol. Chem. 274:23443). Other membersare reported to have antiangiogenic potential (Kuno et al., supra)and/or procollagen processing (Colige, A. et al. (1997) Proc. Natl.Acad. Sci. USA 94:2374).

The discovery of new proteases, and the polynucleotides encoding them,satisfies a need in the art by providing new compositions which areuseful in the diagnosis, prevention, and treatment of gastrointestinal,cardiovascular, autoimmune/inflammatory, cell proliferative,developmental, epithelial, neurological, and reproductive disorders, andin the assessment of the effects of exogenous compounds on theexpression of nucleic acid and amino acid sequences of proteases.

SUMMARY OF THE INVENTION

The invention features purified polypeptides, proteases, referred tocollectively as “PRTS” and individually as “PRTS-1,” “PRTS-2,”“PRTS-3,”“PRTS-4,” “PRTS-5,” “PRTS-6,” “PRTS-7,”“PRTS-8,” “PRTS-9,” “PRTS-10,”“PRTS-11,” “PRTS-12,” “PRTS-13,” “PRTS-14,” “PRTS-15,” “PRTS-16,” and“PRTS-17.” In one aspect, the invention provides an isolated polypeptideselected from the group consisting of a) a polypeptide comprising anamino acid sequence selected from the group consisting of SEQ IDNO:1-17, b) a polypeptide comprising a naturally occurring amino acidsequence at least 90% identical to an amino acid sequence selected fromthe group consisting of SEQ ID NO:1-17, c) a biologically activefragment of a polypeptide having an amino acid sequence selected fromthe group consisting of SEQ ID NO:1-17, and d) an immunogenic fragmentof a polypeptide having an amino acid sequence selected from the groupconsisting of SEQ ID NO:1-17. In one alternative, the invention providesan isolated polypeptide comprising the amino acid sequence of SEQ IDNO:1-17.

The invention further provides an isolated polynucleotide encoding apolypeptide selected from the group consisting of a) a polypeptidecomprising an amino acid sequence selected from the group consisting ofSEQ ID NO:1-17, b) a polypeptide comprising a naturally occurring aminoacid sequence at least 90% identical to an amino acid sequence selectedfrom the group consisting of SEQ ID NO:1-17, c) a biologically activefragment of a polypeptide having an amino acid sequence selected fromthe group consisting of SEQ ID NO:1-17, and d) an immunogenic fragmentof a polypeptide having an amino acid sequence selected from the groupconsisting of SEQ ID NO:1-17. In one alternative, the polynucleotideencodes a polypeptide selected from the group consisting of SEQ IDNO:1-17. In another alternative, the polynucleotide is selected from thegroup consisting of SEQ ID NO:18-34.

Additionally, the invention provides a recombinant polynucleotidecomprising a promoter sequence operably linked to a polynucleotideencoding a polypeptide selected from the group consisting of a) apolypeptide comprising an amino acid sequence selected from the groupconsisting of SEQ ID NO:1-17, b) a polypeptide comprising a naturallyoccurring amino acid sequence at least 90% identical to an amino acidsequence selected from the group consisting of SEQ ID NO:1-17, c) abiologically active fragment of a polypeptide having an amino acidsequence selected from the group consisting of SEQ ID NO:1-17, and d) animmunogenic fragment of a polypeptide having an amino acid sequenceselected from the group consisting of SEQ ID NO:1-17. In onealternative, the invention provides a cell transformed with therecombinant polynucleotide. In another alternative, the inventionprovides a transgenic organism comprising the recombinantpolynucleotide.

The invention also provides a method for producing a polypeptideselected from the group consisting of a) a polypeptide comprising anamino acid sequence selected from the group consisting of SEQ IDNO:1-17, b) a polypeptide comprising a naturally occurring amino acidsequence at least 90% identical to an amino acid sequence selected fromthe group consisting of SEQ ID NO:1-17, c) a biologically activefragment of a polypeptide having an amino acid sequence selected fromthe group consisting of SEQ ID NO:1-17, and d) an immunogenic fragmentof a polypeptide having an amino acid sequence selected from the groupconsisting of SEQ ID NO:1-17. The method comprises a) culturing a cellunder conditions suitable for expression of the polypeptide, whereinsaid cell is transformed with a recombinant polynucleotide comprising apromoter sequence operably linked to a polynucleotide encoding thepolypeptide, and b) recovering the polypeptide so expressed.

Additionally, the invention provides an isolated antibody whichspecifically binds to a polypeptide selected from the group consistingof a) a polypeptide comprising an amino acid sequence selected from thegroup consisting of SEQ ID NO:1-17, b) a polypeptide comprising anaturally occurring amino acid sequence at least 90% identical to anamino acid sequence selected from the group consisting of SEQ IDNO:1-17, c) a biologically active fragment of a polypeptide having anamino acid sequence selected from the group consisting of SEQ IDNO:1-17, and d) an immunogenic fragment of a polypeptide having an aminoacid sequence selected from the group consisting of SEQ ID NO:1-17.

The invention further provides an isolated polynucleotide selected fromthe group consisting of a) a polynucleotide comprising a polynucleotidesequence selected from the group consisting of SEQ ID NO:18-34, b) apolynucleotide comprising a naturally occurring polynucleotide sequenceat least 90% identical to a polynucleotide sequence selected from thegroup consisting of SEQ ID NO:18-34, c) a polynucleotide complementaryto the polynucleotide of a), d) a polynucleotide complementary to thepolynucleotide of b), and e) an RNA equivalent of a)-d). In onealternative, the polynucleotide comprises at least 60 contiguousnucleotides.

Additionally, the invention provides a method for detecting a targetpolynucleotide in a sample, said target polynucleotide having a sequenceof a polynucleotide selected from the group consisting of a) apolynucleotide comprising a polynucleotide sequence selected from thegroup consisting of SEQ ID NO:18-34, b) a polynucleotide comprising anaturally occurring polynucleotide sequence at least 90% identical to apolynucleotide sequence selected from the group consisting of SEQ IDNO:18-34, c) a polynucleotide complementary to the polynucleotide of a),d) a polynucleotide complementary to the polynucleotide of b), and e) anRNA equivalent of a)-d). The method comprises a) hybridizing the samplewith a probe comprising at least 20 contiguous nucleotides comprising asequence complementary to said target polynucleotide in the sample, andwhich probe specifically hybridizes to said target polynucleotide, underconditions whereby a hybridization complex is formed between said probeand said target polynucleotide or fragments thereof, and b) detectingthe presence or absence of said hybridization complex, and optionally,if present, the amount thereof. In one alternative, the probe comprisesat least 60 contiguous nucleotides.

The invention further provides a method for detecting a targetpolynucleotide in a sample, said target polynucleotide having a sequenceof a polynucleotide selected from the group consisting of a) apolynucleotide comprising a polynucleotide sequence selected from thegroup consisting of SEQ ID NO:18-34, b) a polynucleotide comprising anaturally occurring polynucleotide sequence at least 90% identical to apolynucleotide sequence selected from the group consisting of SEQ IDNO:18-34, c) a polynucleotide complementary to the polynucleotide of a),d) a polynucleotide complementary to the polynucleotide of b), and e) anRNA equivalent of a)-d). The method comprises a) amplifying said targetpolynucleotide or fragment thereof using polymerase chain reactionamplification, and b) detecting the presence or absence of saidamplified target polynucleotide or fragment thereof, and, optionally, ifpresent, the amount thereof.

The invention further provides a composition comprising an effectiveamount of a polypeptide selected from the group consisting of a) apolypeptide comprising an amino acid sequence selected from the groupconsisting of SEQ ID NO:1-17, b) a polypeptide comprising a naturallyoccurring amino acid sequence at least 90% identical to an amino acidsequence selected from the group consisting of SEQ ID NO:1-17, c) abiologically active fragment of a polypeptide having an amino acidsequence selected from the group consisting of SEQ ID NO:1-17, and d) animmunogenic fragment of a polypeptide having an amino acid sequenceselected from the group consisting of SEQ ID NO:1-17, and apharmaceutically acceptable excipient. In one embodiment, thecomposition comprises an amino acid sequence selected from the groupconsisting of SEQ ID NO:1-17. The invention additionally provides amethod of treating a disease or condition associated with decreasedexpression of functional PRTS, comprising administering to a patient inneed of such treatment the composition.

The invention also provides a method for screening a compound foreffectiveness as an agonist of a polypeptide selected from the groupconsisting of a) a polypeptide comprising an amino acid sequenceselected from the group consisting of SEQ ID NO:1-17, b) a polypeptidecomprising a naturally occurring amino acid sequence at least 90%identical to an amino acid sequence selected from the group consistingof SEQ ID NO:1-17, c) a biologically active fragment of a polypeptidehaving an amino acid sequence selected from the group consisting of SEQID NO:1-17, and d) an immunogenic fragment of a polypeptide having anamino acid sequence selected from the group consisting of SEQ IDNO:1-17. The method comprises a) exposing a sample comprising thepolypeptide to a compound, and b) detecting agonist activity in thesample. In one alternative, the invention provides a compositioncomprising an agonist compound identified by the method and apharmaceutically acceptable excipient. In another alternative, theinvention provides a method of treating a disease or conditionassociated with decreased expression of functional PRTS, comprisingadministering to a patient in need of such treatment the composition.

Additionally, the invention provides a method for screening a compoundfor effectiveness as an antagonist of a polypeptide selected from thegroup consisting of a) a polypeptide comprising an amino acid sequenceselected from the group consisting of SEQ ID NO:1-17, b) a polypeptidecomprising a naturally occurring amino acid sequence at least 90%identical to an amino acid sequence selected from the group consistingof SEQ ID NO:1-17, c) a biologically active fragment of a polypeptidehaving an amino acid sequence selected from the group consisting of SEQID NO:1-17, and d) an immunogenic fragment of a polypeptide having anamino acid sequence selected from the group consisting of SEQ IDNO:1-17. The method comprises a) exposing a sample comprising thepolypeptide to a compound, and b) detecting antagonist activity in thesample. In one alternative, the invention provides a compositioncomprising an antagonist compound identified by the method and apharmaceutically acceptable excipient. In another alternative, theinvention provides a method of treating a disease or conditionassociated with overexpression of functional PRTS, comprisingadministering to a patient in need of such treatment the composition.

The invention further provides a method of screening for a compound thatspecifically binds to a polypeptide selected from the group consistingof a) a polypeptide comprising an amino acid sequence selected from thegroup consisting of SEQ ID NO:1-17, b) a polypeptide comprising anaturally occurring amino acid sequence at least 90% identical to anamino acid sequence selected from the group consisting of SEQ IDNO:1-17, c) a biologically active fragment of a polypeptide having anamino acid sequence selected from the group consisting of SEQ IDNO:1-17, and d) an immunogenic fragment of a polypeptide having an aminoacid sequence selected from the group consisting of SEQ ID NO:1-17. Themethod comprises a) combining the polypeptide with at least one testcompound under suitable conditions, and b) detecting binding of thepolypeptide to the test compound, thereby identifying a compound thatspecifically binds to the polypeptide.

The invention further provides a method of screening for a compound thatmodulates the activity of a polypeptide selected from the groupconsisting of a) a polypeptide comprising an amino acid sequenceselected from the group consisting of SEQ ID NO:1-17, b) a polypeptidecomprising a naturally occurring amino acid sequence at least 90%identical to an amino acid sequence selected from the group consistingof SEQ ID NO:1-17, c) a biologically active fragment of a polypeptidehaving an amino acid sequence selected from the group consisting of SEQID NO:1-17, and d) an immunogenic fragment of a polypeptide having anamino acid sequence selected from the group consisting of SEQ IDNO:1-17. The method comprises a) combining the polypeptide with at leastone test compound under conditions permissive for the activity of thepolypeptide, b) assessing the activity of the polypeptide in thepresence of the test compound, and c) comparing the activity of thepolypeptide in the presence of the test compound with the activity ofthe polypeptide in the absence of the test compound, wherein a change inthe activity of the polypeptide in the presence of the test compound isindicative of a compound that modulates the activity of the polypeptide.

The invention further provides a method for screening a compound foreffectiveness in altering expression of a target polynucleotide, whereinsaid target polynucleotide comprises a polynucleotide sequence selectedfrom the group consisting of SEQ ID NO:18-34, the method comprising a)exposing a sample comprising the target polynucleotide to a compound,and b) detecting altered expression of the target polynucleotide.

The invention further provides a method for assessing toxicity of a testcompound, said method comprising a) treating a biological samplecontaining nucleic acids with the test compound; b) hybridizing thenucleic acids of the treated biological sample with a probe comprisingat least 20 contiguous nucleotides of a polynucleotide selected from thegroup consisting of i) a polynucleotide comprising a polynucleotidesequence selected from the group consisting of SEQ ID NO:18-34, ii) apolynucleotide comprising a naturally occurring polynucleotide sequenceat least 90% identical to a polynucleotide sequence selected from thegroup consisting of SEQ ID NO:18-34, iii) a polynucleotide having asequence complementary to i), iv) a polynucleotide complementary to thepolynucleotide of ii), and v) an RNA equivalent of i)-iv). Hybridizationoccurs under conditions whereby a specific hybridization complex isformed between said probe and a target polynucleotide in the biologicalsample, said target polynucleotide selected from the group consisting ofi) a polynucleotide comprising, a polynucleotide sequence selected fromthe group consisting of SEQ ID NO:18-34, ii) a polynucleotide comprisinga naturally occurring polynucleotide sequence at least 90% identical toa polynucleotide sequence selected from the group consisting of SEQ IDNO:18-34, iii) a polynucleotide complementary to the polynucleotide ofi), iv) a polynucleotide complementary to the polynucleotide of ii), andv) an RNA equivalent of i)-iv). Alternatively, the target polynucleotidecomprises a fragment of a polynucleotide sequence selected from thegroup consisting of i)-v) above; c) quantifying the amount ofhybridization complex; and d) comparing the amount of hybridizationcomplex in the treated biological sample with the amount ofhybridization complex in an untreated biological sample, wherein adifference in the amount of hybridization complex in the treatedbiological sample is indicative of toxicity of the test compound.

BRIEF DESCRIPTION OF THE TABLES

Table 1 summarizes the nomenclature for the full length polynucleotideand polypeptide sequences of the present invention.

Table 2 shows the GenBank identification number and annotation of thenearest GenBank homolog for polypeptides of the invention. Theprobability score for the match between each polypeptide and its GenBankhomolog is also shown.

Table 3 shows structural features of polypeptide sequences of theinvention, including predicted motifs and domains, along with themethods, algorithms, and searchable databases used for analysis of thepolypeptides.

Table 4 lists the cDNA and/or genomic DNA fragments which were used toassemble polynucleotide sequences of the invention, along with selectedfragments of the polynucleotide sequences.

Table 5 shows the representative cDNA library for polynucleotides of theinvention.

Table 6 provides an appendix which describes the tissues and vectorsused for construction of the cDNA libraries shown in Table 5.

Table 7 shows the tools, programs, and algorithms used to analyze thepolynucleotides and polypeptides of the invention, along with applicabledescriptions, references, and threshold parameters.

DESCRIPTION OF THE INVENTION

Before the present proteins, nucleotide sequences, and methods aredescribed, it is understood that this invention is not limited to theparticular machines, materials and methods described, as these may vary.It is also to be understood that the terminology used herein is for thepurpose of describing particular embodiments only, and is not intendedto limit the scope of the present invention which will be limited onlyby the appended claims.

It must be noted that as used herein and in the appended claims, thesingular forms “a,” “an,” and “the” include plural reference unless thecontext clearly dictates otherwise. Thus, for example, a reference to “ahost cell” includes a plurality of such host cells, and a reference to“an antibody” is a reference to one or more antibodies and equivalentsthereof known to those skilled in the art, and so forth.

Unless defined otherwise, all technical and scientific terms used hereinhave the same meanings as commonly understood by one of ordinary skillin the art to which this invention belongs. Although any machines,materials, and methods similar or equivalent to those described hereincan be used to practice or test the present invention, the preferredmachines, materials and methods are now described. All publicationsmentioned herein are cited for the purpose of describing and disclosingthe cell lines, protocols, reagents and vectors which are reported inthe publications and which might be used in connection with theinvention. Nothing herein is to be construed as an admission that theinvention is not entitled to antedate such disclosure by virtue of priorinvention.

Definitions

“PRTS” refers to the amino acid sequences of substantially purified PRTSobtained from any species, particularly a mammalian species, includingbovine, ovine, porcine, murine, equine, and human and from any source,whether natural, synthetic, semi-synthetic, or recombinant

The term “agonist” refers to a molecule which intensifies or mimics thebiological activity of PRTS. Agonists may include proteins, nucleicacids, carbohydrates, small molecules, or any other compound orcomposition which modulates the activity of PRTS either by directlyinteracting with PRTS or by acting on components of the biologicalpathway in which PRTS participates.

An “allelic variant” is an alternative form of the gene encoding PRTS.Allelic variants may result from at least one mutation in the nucleicacid sequence and may result in altered mRNAs or in polypeptides whosestructure or function may or may not be altered. A gene may have none,one, or many allelic variants of its naturally occurring form. Commonmutational changes which give rise to allelic variants are generallyascribed to natural deletions, additions, or substitutions ofnucleotides. Each of these types of changes may occur alone, or incombination with the others, one or more times in a given sequence.

“Altered” nucleic acid sequences encoding PRTS include those sequenceswith deletions, insertions, or substitutions of different nucleotides,resulting in a polypeptide the same as PRTS or a polypeptide with atleast one functional characteristic of PRTS. Included within thisdefinition are polymorphisms which may or may not be readily detectableusing a particular oligonucleotide probe of the polynucleotide encodingPRTS, and improper or unexpected hybridization to allelic variants, witha locus other than the normal chromosomal locus for the polynucleotidesequence encoding PRTS. The encoded protein may also be “altered,” andmay contain deletions, insertions, or substitutions of amino acidresidues which produce a silent change and result in a functionallyequivalent PRTS. Deliberate amino acid substitutions may be made on thebasis of similarity in polarity, charge, solubility, hydrophobicity,hydrophilicity, and/or the amphipathic nature of the residues, as longas the biological or immunological activity of PRTS is retained. Forexample, negatively charged amino acids may include aspartic acid andglutamic acid, and positively charged amino acids may include lysine andarginine. Amino acids with uncharged polar side chains having similarhydrophilicity values may include: asparagine and glutamine; and serineand threonine. Amino acids with uncharged side chains having similarhydrophilicity values may include: leucine, isoleucine, and valine;glycine and alanine; and phenylalanine and tyrosine.

The terms “amino acid” and “amino acid sequence” refer to anoligopeptide, peptide, polypeptide, or protein sequence, or a fragmentof any of these, and to naturally occurring or synthetic molecules.Where “amino acid sequence” is recited to refer to a sequence of anaturally occurring protein molecule, “amino acid sequence” and liketerms are not meant to limit the amino acid sequence to the completenative amino acid sequence associated with the recited protein molecule.

“Amplification” relates to the production of additional copies of anucleic acid sequence. Amplification is generally carried out usingpolymerase chain reaction (PCR) technologies well known in the art.

The term “antagonist” refers to a molecule which inhibits or attenuatesthe biological activity of PRTS. Antagonists may include proteins suchas antibodies, nucleic acids, carbohydrates, small molecules, or anyother compound or composition which modulates the activity of PRTSeither by directly interacting with PRTS or by acting on components ofthe biological pathway in which PRTS participates.

The term “antibody” refers to intact immunoglobulin molecules as well asto fragments thereof, such as Fab, F(ab′)₂, and Fv fragments, which arecapable of binding an epitopic determinant. Antibodies that bind PRTSpolypeptides can be prepared using intact polypeptides or usingfragments containing small peptides of interest as the immunizingantigen. The polypeptide or oligopeptide used to immunize an animal(e.g., a mouse, a rat, or a rabbit) can be derived from the translationof RNA, or synthesized chemically, and can be conjugated to a carrierprotein if desired. Commonly used carriers that are chemically coupledto peptides include bovine serum albumin, thyroglobulin, and keyholelimpet hemocyanin (KLH). The coupled peptide is then used to immunizethe animal

The term “antigenic determinant” refers to that region of a molecule(i.e., an epitope) that makes contact with a particular antibody. When aprotein or a fragment of a protein is used to immunize a host animal,numerous regions of the protein may induce the production of antibodieswhich bind specifically to antigenic determinants (particular regions orthree dimensional structures on the protein). An antigenic determinantmay compete with the intact antigen (i.e., the immunogen used to elicitthe immune response) for binding to an antibody.

The term “antisense” refers to any composition capable of base-pairingwith the “sense” (coding) strand of a specific nucleic acid sequence.Antisense compositions may include DNA; RNA; peptide nucleic acid (PNA);oligonucleotides having modified backbone linkages such asphosphorothioates, methylphosphonates, or benzylphosphonates;oligonucleotides having modified sugar groups such as 2′-methoxyethylsugars or 2′-methoxyethoxy sugars; or oligonucleotides having modifiedbases such as 5-methyl cytosine, 2′-deoxyuracil, or7-deaza-2′-deoxyguanosine. Antisense molecules may be produced by anymethod including chemical synthesis or transcription. Once introducedinto a cell, the complementary antisense molecule base-pairs with anaturally occurring nucleic acid sequence produced by the cell to formduplexes which block either transcription or translation. Thedesignation “negative” or “minus” can refer to the antisense strand, andthe designation “positive” or “plus” can refer to the sense strand of areference DNA molecule.

The term “biologically active” refers to a protein having structural,regulatory, or biochemical functions of a naturally occurring molecule.Likewise, “immunologically active” or “immunogenic” refers to thecapability of the natural, recombinant, or synthetic PRTS, or of anyoligopeptide thereof, to induce a specific immune response inappropriate animals or cells and to bind with specific antibodies.

“Complementary” describes the relationship between two single-strandednucleic acid sequences that anneal by base-pairing. For example,5′-AGT-3′ pairs with its complement, 3′-TCA-5′.

A “composition comprising a given polynucleotide sequence” and a“composition comprising a given amino acid sequence” refer broadly toany composition containing the given polynucleotide or amino acidsequence. The composition may comprise a dry formulation or an aqueoussolution. Compositions comprising polynucleotide sequences encoding PRTSor fragments of PRTS may be employed as hybridization probes. The probesmay be stored in freeze-dried form and may be associated with astabilizing agent such as a carbohydrate. In hybridizations, the probemay be deployed in an aqueous solution containing salts (e.g., NaCl),detergents (e.g., sodium dodecyl sulfate; SDS), and other components(e.g., Denhardt's solution, dry milk, salmon sperm DNA, etc.).

“Consensus sequence” refers to a nucleic acid sequence which has beensubjected to repeated DNA sequence analysis to resolve uncalled bases,extended using the XL-PCR kit (Applied Biosystems, Foster City, Calif.)in the 5′ and/or the 3′ direction, and resequenced, or which has beenassembled from one or more overlapping cDNA, EST, or genomic DNAfragments using a computer program for fragment assembly, such as theGELVIEW fragment assembly system (GCG, Madison Wis.) or Phrap(University of Washington, Seattle Wash.). Some sequences have been bothextended and assembled to produce the consensus sequence.

“Conservative amino acid substitutions” are those substitutions that arepredicted to least interfere with the properties of the originalprotein, i.e., the structure and especially the function of the proteinis conserved and not significantly changed by such substitutions. Thetable below shows amino acids which may be substituted for an originalamino acid in a protein and which are regarded as conservative aminoacid substitutions.

Original Residue Conservative Substitution Ala Gly, Ser Arg His, Lys AsnAsp, Gln, His Asp Asn, Glu Cys Ala, Ser Gln Asn, Glu, His Glu Asp, Gln,His Gly Ala His Asn, Arg, Gln, Glu Ile Leu, Val Leu Ile, Val Lys Arg,Gln, Glu Met Leu, Ile Phe His, Met, Leu, Trp, Tyr Ser Cys, Thr Thr Ser,Val Trp Phe, Tyr Tyr His, Phe, Trp Val Ile, Leu, Thr

Conservative amino acid substitutions generally maintain (a) thestructure of the polypeptide backbone in the area of the substitution,for example, as a beta sheet or alpha helical conformation, (b) thecharge or hydrophobicity of the molecule at the site of thesubstitution, and/or (c) the bulk of the side chain.

A “deletion” refers to a change in the amino acid or nucleotide sequencethat results in the absence of one or more amino acid residues ornucleotides.

The term “derivative” refers to a chemically modified polynucleotide orpolypeptide. Chemical modifications of a polynucleotide can include, forexample, replacement of hydrogen by an alkyl, acyl, hydroxyl, or aminogroup. A derivative polynucleotide encodes a polypeptide which retainsat least one biological or immunological function of the naturalmolecule. A derivative polypeptide is one modified by glycosylation,pegylation, or any similar process that retains at least one biologicalor immunological function of the polypeptide from which it was derived.

A “detectable label” refers to a reporter molecule or enzyme that iscapable of generating a measurable signal and is covalently ornoncovalently joined to a polynucleotide or polypeptide.

“Differential expression” refers to increased or upregulated; ordecreased, downregulated, or absent gene or protein expression,determined by comparing at least two different samples. Such comparisonsmay be carried out between, for example, a treated and an untreatedsample, or a diseased and a normal sample.

“Exon shuffling” refers to the recombination of different coding regions(exons). Since an exon may represent a structural or functional domainof the encoded protein, new proteins may be assembled through the novelreassortment of stable substructures, thus allowing acceleration of theevolution of new protein functions.

A “fragment” is a unique portion of PRTS or the polynucleotide encodingPRTS which is identical in sequence to but shorter in length than theparent sequence. A fragment may comprise up to the entire length of thedefined sequence, minus one nucleotide/amino acid residue. For example,a fragment may comprise from 5 to 1000 contiguous nucleotides or aminoacid residues. A fragment used as a probe, primer, antigen, therapeuticmolecule, or for other purposes, maybe at least 5, 10, 15, 16, 20, 25,30, 40, 50, 60, 75, 100, 150, 250 or at least 500 contiguous nucleotidesor amino acid residues in length Fragments may be preferentiallyselected from certain regions of a molecule. For example, a polypeptidefragment may comprise a certain length of contiguous amino acidsselected from the first 250 or 500 amino acids (or first 25% or 50%) ofa polypeptide as shown in a certain defined sequence. Clearly theselengths are exemplary, and any length that is supported by thespecification, including the Sequence Listing, tables, and figures, maybe encompassed by the present embodiments.

A fragment of SEQ ID NO:18-34 comprises a region of uniquepolynucleotide sequence that specifically identifies SEQ ID NO:18-34,for example, as distinct from any other sequence in the genome fromwhich the fragment was obtained. A fragment of SEQ ID NO:18-34 isuseful, for example, in hybridization and amplification technologies andin analogous methods that distinguish SEQ ID NO:18-34 from relatedpolynucleotide sequences. The precise length of a fragment of SEQ IDNO:18-34 and the region of SEQ ID NO:18-34 to which the fragmentcorresponds are routinely determinable by one of ordinary skill in theart based on the intended purpose for the fragment

A fragment of SEQ ID NO:1-17 is encoded by a fragment of SEQ IDNO:18-34. A fragment of SEQ ID NO:1-17 comprises a region of uniqueamino acid sequence that specifically identifies SEQ ID NO:1-17. Forexample, a fragment of SEQ ID NO:1-17 is useful as an immunogenicpeptide for the development of antibodies that specifically recognizeSEQ ID NO:1-17. The precise length of a fragment of SEQ ID NO:1-17 andthe region of SEQ ID NO:1-17 to which the fragment corresponds areroutinely determinable by one of ordinary skill in the art based on theintended purpose for the fragment.

A “full length” polynucleotide sequence is one containing at least atranslation initiation codon (e.g., methionine) followed by an openreading frame and a translation termination codon. A “full length”polynucleotide sequence encodes a “full length” polypeptide sequence.

“Homology” refers to sequence similarity or, interchangeably, sequenceidentity, between two or more polynucleotide sequences or two or morepolypeptide sequences.

The terms “percent identity” and “% identity,” as applied topolynucleotide sequences, refer to the percentage of residue matchesbetween at least two polynucleotide sequences aligned using astandardized algorithm. Such an algorithm may insert, in a standardizedand reproducible way, gaps in the sequences being compared in order tooptimize alignment between two sequences, and therefore achieve a moremeaningful comparison of the two sequences.

Percent identity between polynucleotide sequences may be determinedusing the default parameters of the CLUSTAL V algorithm as incorporatedinto the NEGAUGN version 3.12e sequence alignment program. This programis part of the LASERGENE software package, a suite of molecularbiological analysis programs (DNASTAR, Madison Wis.). CLUSTAL V isdescribed in Higgins, D. G. and P. M. Sharp (1989) CABIOS 5:151-153 andin Higgins, D. G. et al. (1992) CABIOS 8:189-191. For pairwisealignments of polynucleotide sequences, the default parameters are setas follows: Ktuple=2, gap penalty=5, window=4, and “diagonals saved”=4.The “weighted” residue weight table is selected as the default. Percentidentity is reported by CLUSTAL V as the “percent similarity” betweenaligned polynucleotide sequences.

Alternatively, a suite of commonly used and freely available sequencecomparison algorithms is provided by the National Center forBiotechnology Information (NCBI) Basic Local Alignment Search Tool(BLAST) (Altschul, S. F. et al. (1990) J. Mol. Biol. 215:403-410), whichis available from several sources, including the NCBI, Bethesda, Md.,and on the Internet at http://www.ncbi.nlm.nih.gov/BLAST/. The BLASTsoftware suite includes various sequence analysis programs including“blastn,” that is used to align a known polynucleotide sequence withother polynucleotide sequences from a variety of databases. Alsoavailable is a tool called “BLAST 2 Sequences” that is used for directpairwise comparison of two nucleotide sequences. “BLAST 2 Sequences” canbe accessed and used interactively athttp./www.ncbi.nlm.nih.gov/gorf/bl2.html. The “BLAST 2 Sequences” toolcan be used for both blastn and blastp (discussed below). BLAST programsare commonly used with gap and other parameters set to default settings.For example, to compare two nucleotide sequences, one may use blastnwith the “BLAST 2 Sequences” tool Version 2.0.12 (Apr. 21, 2000) set atdefault parameters. Such default parameters may be, for example:

Matrix: BLOSUM62

Reward for match: 1

Penalty for mismatch: −2

Open Gap: 5 and Extension Gap: 2 penalties

Gap x drop-off: 50

Expect: 10

Word Size: 11

Filter: on

Percent identity may be measured over the length of an entire definedsequence, for example, as defined by a particular SEQ ID number, or maybe measured over a shorter length, for example, over the length of afragment taken from a larger, defined sequence, for instance, a fragmentof at least 20, at least 30, at least 40, at least 50, at least 70, atleast 100, or at least 200 contiguous nucleotides. Such lengths areexemplary only, and it is understood that any fragment length supportedby the sequences shown herein, in the tables, figures, or SequenceListing, may be used to describe a length over which percentage identitymay be measured.

Nucleic acid sequences that do not show a high degree of identity maynevertheless encode similar amino acid sequences due to the degeneracyof the genetic code. It is understood that changes in a nucleic acidsequence can be made using this degeneracy to produce multiple nucleicacid sequences that all encode substantially the same protein.

The phrases “percent identity” and “% identity,” as applied topolypeptide sequences, refer to the percentage of residue matchesbetween at least two polypeptide sequences aligned using a standardizedalgorithm. Methods of polypeptide sequence alignment are well-known.Some alignment methods take into account conservative amino acidsubstitutions. Such conservative substitutions, explained in more detailabove, generally preserve the charge and hydrophobicity at the site ofsubstitution, thus preserving the structure (and therefore function) ofthe polypeptide.

Percent identity between polypeptide sequences may be determined usingthe default parameters of the CLUSTAL V algorithm as incorporated intothe MEGALIGN version 3.12e sequence alignment program (described andreferenced above). For pairwise alignments of polypeptide sequencesusing CLUSTAL V, the default parameters are set as follows: Ktuple=1,gap penalty=3, window=5, and “diagonals saved”=5. The PAM250 matrix isselected as the default residue weight table. As with polynucleotidealignments, the percent identity is reported by CLUSTAL V as the“percent similarity” between aligned polypeptide sequence pairs.

Alternatively the NCBI BLAST software suite may be used. For example,for a pairwise comparison of two polypeptide sequences, one may use the“BLAST 2 Sequences” tool Version 2.0.12 (Apr. 21, 2000) with blastp setat default parameters. Such default parameters may be, for example:

Matrix: BLOSUM62

Open Gap: 11 and Extension Gap: 1 penalties

Gap x drop-off: 50

Expect: 10

Word Size: 3

Filter: on

Percent identity may be measured over the length of an entire definedpolypeptide sequence, for example, as defined by a particular SEQ IDnumber, or may be measured over a shorter length, for example, over thelength of a fragment taken from a larger, defined polypeptide sequence,for instance, a fragment of at least 15, at least 20, at least 30, atleast 40, at least 50, at least 70 or at least 150 contiguous residues.Such lengths are exemplary only, and it is understood that any fragmentlength supported by the sequences shown herein, in the tables, figuresor Sequence Listing, may be used to describe a length over whichpercentage identity may be measured.

“Human artificial chromosomes” (HACs) are linear microchromosomes whichmay contain DNA sequences of about 6 kb to 10 Mb in size and whichcontain all of the elements required for chromosome replication,segregation and maintenance.

The term “humanized antibody” refers to an antibody molecule in whichthe amino acid sequence in the non-antigen binding regions has beenaltered so that the antibody more closely resembles a human antibody,and still retains its original binding ability.

“Hybridization” refers to the process by which a polynucleotide strandanneals with a complementary strand through base pairing under definedhybridization conditions. Specific hybridization is an indication thattwo nucleic acid sequences share a high degree of complementarity.Specific hybridization complexes form under permissive annealingconditions and remain hybridized after the “washing” step(s). Thewashing step(s) is particularly important in determining the stringencyof the hybridization process, with more stringent conditions allowingless non-specific binding, i.e., binding between pairs of nucleic acidstrands that are not perfectly matched. Permissive conditions forannealing of nucleic acid sequences are routinely determinable by one ofordinary skill in the art and may be consistent among hybridizationexperiments, whereas wash conditions may be varied among experiments toachieve the desired stringency, and therefore hybridization specificity.Permissive annealing conditions occur, for example, at 68° C. in thepresence of about 6×SSC, about 1% (w/v) SDS, and about 100 μg/mlsheared, denatured salmon sperm DNA.

Generally, stringency of hybridization is expressed, in part, withreference to the temperature under which the wash step is carried out.Such wash temperatures are typically selected to be about 5° C. to 20°C. lower than the thermal melting point (T_(m)) for the specificsequence at a defined ionic strength and pH. The T_(m) is thetemperature (under defined ionic strength and pH) at which 50% of thetarget sequence hybridizes to a perfectly matched probe. An equation forcalculating T_(m) and conditions for nucleic acid hybridization are wellknown and can be found in Sambrook, J. et al. (1989) Molecular Cloning:A Laboratory Manual, 2^(nd) ed., vol. 1-3, Cold Spring Harbor Press,Plainview N.Y.; specifically see volume 2, chapter 9.

High stringency conditions for hybridization between polynucleotides ofthe present invention include wash conditions of 68° C. in the presenceof about 0.2×SSC and about 0.1% SDS, for 1 hour. Alternatively,temperatures of about 65° C., 60° C., 55° C., or 42° C. may be used. SSCconcentration may be varied from about 0.1 to 2×SSC, with SDS beingpresent at about 0.1%. Typically, blocking reagents are used to blocknon-specific hybridization. Such blocking reagents include, forinstance, sheared and denatured salmon sperm DNA at about 100-200 μg/ml.Organic solvent, such as formamide at a concentration of about 35-50%v/v, may also be used under particular circumstances, such as forRNA:DNA hybridizations. Useful variations on these wash conditions willbe readily apparent to those of ordinary skill in the art.Hybridization, particularly under high stringency conditions, may besuggestive of evolutionary similarity between the nucleotides. Suchsimilarity is strongly indicative of a similar role for the nucleotidesand their encoded polypeptides.

The term “hybridization complex” refers to a complex formed between twonucleic acid sequences by virtue of the formation of hydrogen bondsbetween complementary bases. A hybridization complex may be formed insolution (e.g., C₀t or R₀t analysis) or formed between one nucleic acidsequence present in solution and another nucleic acid sequenceimmobilized on a solid support (e.g., paper, membranes, filters, chips,pins or glass slides, or any other appropriate substrate to which cellsor their nucleic acids have been fixed).

The words “insertion” and “addition” refer to changes in an amino acidor nucleotide sequence resulting in the addition of one or more aminoacid residues or nucleotides, respectively.

“Immune response” can refer to conditions associated with inflammation,trauma, immune disorders, or infectious or genetic disease, etc. Theseconditions can be characterized by expression of various factors, e.g.,cytokines, chemokines, and other signaling molecules, which may affectcellular and systemic defense systems.

An “immunogenic fragment” is a polypeptide or oligopeptide fragment ofPRTS which is capable of eliciting an immune response when introducedinto a living organism, for example, a mammal. The term “immunogenicfragment” also includes any polypeptide or oligopeptide fragment of PRTSwhich is useful in any of the antibody production methods disclosedherein or known in the art.

The term “microarray” refers to an arrangement of a plurality ofpolynucleotides, polypeptides, or other chemical compounds on asubstrate.

The terms “element” and “array element” refer to a polynucleotide,polypeptide, or other chemical compound having a unique and definedposition on a microarray.

The term “modulate” refers to a change in the activity of PRTS. Forexample, modulation may cause an increase or a decrease in proteinactivity, binding characteristics, or any other biological, functional,or immunological properties of PRTS.

The phrases “nucleic acid” and “nucleic acid sequence” refer to anucleotide, oligonucleotide, polynucleotide, or any fragment thereof.These phrases also refer to DNA or RNA of genomic or synthetic originwhich may be single-stranded or double-stranded and may represent thesense or the antisense strand, to peptide nucleic acid (PNA), or to anyDNA-like or RNA-like material.

“Operably linked” refers to the situation in which a first nucleic acidsequence is placed in a functional relationship with a second nucleicacid sequence. For instance, a promoter is operably linked to a codingsequence if the promoter affects the transcription or expression of thecoding sequence. Operably linked DNA sequences may be in close proximityor contiguous and, where necessary to join two protein coding regions,in the same reading frame.

“Peptide nucleic acid” (PNA) refers to an antisense molecule oranti-gene agent which comprises an oligonucleotide of at least about 5nucleotides in length linked to a peptide backbone of amino acidresidues ending in lysine. The terminal lysine confers solubility to thecomposition. PNAs preferentially bind complementary single stranded DNAor RNA and stop transcript elongation, and may be pegylated to extendtheir lifespan in the cell.

“Post-translational modification” of an PRTS may involve lipidation,glycosylation, phosphorylation, acetylation, racemization, proteolyticcleavage, and other modifications known in the art. These processes mayoccur synthetically or biochemically. Biochemical modifications willvary by cell type depending on the enzymatic milieu of PRTS.

“Probe” refers to nucleic acid sequences encoding PRTS, theircomplements, or fragments thereof, which are used to detect identical,allelic or related nucleic acid sequences. Probes are isolatedoligonucleotides or polynucleotides attached to a detectable label orreporter molecule. Typical labels include radioactive isotopes, ligands,chemiluminescent agents, and enzymes. “Primers” are short nucleic acids,usually DNA oligonucleotides, which may be annealed to a targetpolynucleotide by complementary base-pairing. The primer may then beextended along the target DNA strand by a DNA polymerase enzyme. Primerpairs can be used for amplification (and identification) of a nucleicacid sequence, e.g., by the polymerase chain reaction (PCR).

Probes and primers as used in the present invention typically compriseat least 15 contiguous nucleotides of a known sequence. In order toenhance specificity, longer probes and primers may also be employed,such as probes and primers that comprise at least 20, 25, 30, 40, 50,60, 70, 80, 90, 100, or at least 150 consecutive nucleotides of thedisclosed nucleic acid sequences. Probes and primers may be considerablylonger than these examples, and it is understood that any lengthsupported by the specification, including the tables, figures, andSequence Listing, may be used.

Methods for preparing and using probes and primers are described in thereferences, for example Sambrook, J. et al. (1989) Molecular Cloning: ALaboratory Manual, 2^(nd) ed, vol. 1-3, Cold Spring Harbor Press,Plainview N.Y.; Ausubel F. M. et al. (1987) Current Protocols inMolecular Biology, Greene Publ. Assoc. & Wiley-Intersciences, New YorkN.Y.; Innis, M. et al. (1990) PCR Protocols, A Guide to Methods andApplications, Academic Press, San Diego Calif. PCR primer pairs can bederived from a known sequence, for example, by using computer programsintended for that purpose such as Primer (Version 0.5, 1991, WhiteheadInstitute for Biomedical Research, Cambridge Mass.).

Oligonucleotides for use as primers are selected using software known inthe art for such purpose. For example, OLIGO 4.06 software is useful forthe selection of PCR primer pairs of up to 100 nucleotides each, and forthe analysis of oligonucleotides and larger polymncleotides of up to5,000 nucleotides from an input polynucleotide sequence of up to 32kilobases. Similar primer selection programs have incorporatedadditional features for expanded capabilities. For example, the PrimOUprimer selection program (available to the public from the GenoineCenter at University of Texas South West Medical Center, Dallas Tex.) iscapable of choosing specific primers from megabase sequences and is thususeful for designing primers on a genome-wide scope. The Primer3 primerselection program (available to the public from the WhiteheadInstitute/MIT Center for Genome Research, Cambridge Mass.) allows theuser to input a “mispriming library,” in which sequences to avoid asprimer binding sites are user-specified. Primer3 is useful, inparticular, for the selection of oligonucleotides for microarrays. (Thesource code for the latter two primer selection programs may also beobtained from their respective sources and modified to meet the user'sspecific needs.) The PrimeGen program (available to the public from theUK Human Genome Mapping Project Resource Centre, Cambridge UK) designsprimers based on multiple sequence alignnents, thereby allowingselection of primers that hybridize to either the most conserved orleast conserved regions of aligned nucleic acid sequences. Hence, thisprogram is useful for identification of both unique and conservedoligonucleotides and polynucleotide fragments. The oligonucleotides andpolynucleotide fragments identified by any of the above selectionmethods are useful in hybridization technologies, for example, as PCR orsequencing primers, microarray elements, or specific probes to identifyfully or partially complementary polynucleotides in a sample of nucleicacids. Methods of oligonucleotide selection are not limited to thosedescribed above.

A “recombinant nucleic acid” is a sequence that is not naturallyoccuring or has a sequence that is made by an artificial combination oftwo or more otherwise separated segments of sequence. This artificialcombination is often accomplished by chemical synthesis or, morecommonly, by the artificial manipulation of isolated segments of nucleicacids, e.g., by genetic engineering techniques such as those describedin Sambrook, supra. The term recombinant includes nucleic acids thathave been altered solely by addition, substitution, or deletion of aportion of the nucleic acid. Frequently, a recombinant nucleic acid mayinclude a nucleic acid sequence operably linked to a promoter sequence.Such a recombinant nucleic acid may be part of a vector that is used,for example, to transform a cell.

Alternatively, such recombinant nucleic acids may be part of a viralvector, e.g., based on a vaccinia virus, that could be use to vaccinatea mammal wherein the recombinant nucleic acid is expressed, inducing aprotective immunological response in the mammal.

A “regulatory element” refers to a nucleic acid sequence usually derivedfrom untranslated regions of a gene and includes enhancers, promoters,introns, and 5′ and 3′ untranslated regions (UTRs). Regulatory elementsinteract with host or viral proteins which control transcription,translation, or RNA stability.

“Reporter molecules” are chemical or biochemical moieties used forlabeling a nucleic acid, amino acid, or antibody. Reporter moleculesinclude radionuclides; enzymes; fluorescent, chemiluminescent, orchromogenic agents; substrates; cofactors; inhibitors; magneticparticles; and other moieties known in the art.

An “RNA equivalent,” in reference to a DNA sequence, is composed of thesame linear sequence of nucleotides as the reference DNA sequence withthe exception that all occurrences of the nitrogenous base thymine arereplaced with uracil, and the sugar backbone is composed of riboseinstead of deoxyribose.

The term “sample” is used in its broadest sense. A sample suspected ofcontaining PRTS, nucleic acids encoding PRTS, or fragments thereof maycomprise a bodily fluid; an extract from a cell, chromosome, organelle,or membrane isolated from a cell; a cell; genomic DNA, RNA, or cDNA, insolution or bound to a substrate; a tissue; a tissue print; etc.

The terms “specific binding” and “specifically binding” refer to thatinteraction between a protein or peptide and an agonist, an antibody, anantagonist, a small molecule, or any natural or synthetic bindingcomposition. The interaction is dependent upon the presence of aparticular structure of the protein, e.g., the antigenic determinant orepitope, recognized by the binding molecule. For example, if an antibodyis specific for epitope “A,” the presence of a polypeptide comprisingthe epitope A, or the presence of free unlabeled A, in a reactioncontaining free labeled A and the antibody will reduce the amount oflabeled A that binds to the antibody.

The term “substantially purified” refers to nucleic acid or amino acidsequences that are removed from their natural environment and areisolated or separated, and are at least 60% free, preferably at least75% free, and most preferably at least 90% free from other componentswith which they are naturally associated.

A “substitution” refers to the replacement of one or more amino acidresidues or nucleotides by different amino acid residues or nucleotides,respectively.

“Substrate” refers to any suitable rigid or semi-rigid support includingmembranes, filters, chips, slides, wafers, fibers, magnetic ornonmagnetic beads, gels, tubing, plates, polymers, microparticles andcapillaries. The substrate can have a variety of surface forms, such aswells, trenches, pins, channels and pores, to which polynucleotides orpolypeptides are bound.

A “transcript image” refers to the collective pattern of gene expressionby a particular cell type or tissue under given conditions at a giventime.

“Transformation” describes a process by which exogenous DNA isintroduced into a recipient cell. Transformation may occur under naturalor artificial conditions according to various methods well known in theart, and may rely on any known method for the insertion of foreignnucleic acid sequences into a prokaryotic or eukaryotic host cell. Themethod for transformation is selected based on the type of host cellbeing transformed and may include, but is not limited to, bacteriophageor viral infection, electroporation, heat shock, lipofection, andparticle bombardment. The term “transformed cells” includes stablytransformed cells in which the inserted DNA is capable of replicationeither as an autonomously replicating plasmid or as part of the hostchromosome, as well as transiently transformed cells which express theinserted DNA or RNA for limited periods of time.

A “transgenic organism,” as used herein, is any organism, including butnot limited to animals and plants, in which one or more of the cells ofthe organism contains heterologous nucleic acid introduced by way ofhuman intervention, such as by transgenic techniques well known in theart. The nucleic acid is introduced into the cell, directly orindirectly by introduction into a precursor of the cell, by way ofdeliberate genetic manipulation, such as by microinjection or byinfection with a recombinant virus. The term genetic manipulation doesnot include classical cross-breeding, or in vitro fertilization, butrather is directed to the introduction of a recombinant DNA molecule.The transgenic organisms contemplated in accordance with the presentinvention include bacteria, cyanobacteria, fungi, plants and animals.The isolated DNA of the present invention can be introduced into thehost by methods known in the art, for example infection, transfection,transformation or transconjugation. Techniques for transferring the DNAof the present invention into such organisms are widely known andprovided in references such as Sambrook et al. (1989), supra.

A “variant” of a particular nucleic acid sequence is defined as anucleic acid sequence having at least 40% sequence identity to theparticular nucleic acid sequence over a certain length of one of thenucleic acid sequences using blastn with the “BLAST 2 Sequences” toolVersion 2.0.9 (May 7, 1999) set at default parameters. Such a pair ofnucleic acids may show, for example, at least 50%, at least 60%, atleast 70%, at least 80%, at least 85%, at least 90%, at least 91%, atleast 92%, at least 93%, at least 94%, at least 95%, at least 96%, atleast 97%, at least 98%, or at least 99% or greater sequence identityover a certain defined length. A variant may be described as, forexample, an “allelic” (as defined above), “splice,” “species,” or“polymorphic” variant. A splice variant may have significant identity toa reference molecule, but will generally have a greater or lesser numberof polynucleotides due to alternate splicing of exons during mRNAprocessing. The corresponding polypeptide may possess additionalfunctional domains or lack domains that are present in the referencemolecule. Species variants are polynucleotide sequences that vary fromone species to another. The resulting polypeptides will generally havesignificant amino acid identity relative to each other. A polymorphicvariant is a variation in the polynucleotide sequence of a particulargene between individuals of a given species. Polymorphic variants alsomay encompass “single nucleotide polymorphisms” (SNPs) in which thepolynucleotide sequence varies by one nucleotide base. The presence ofSNPs may be indicative of, for example, a certain population, a diseasestate, or a propensity for a disease state.

A “variant” of a particular polypeptide sequence is defined as apolypeptide sequence having at least 40% sequence identity to theparticular polypeptide sequence over a certain length of one of thepolypeptide sequences using blastp with the “BLAST 2 Sequences” toolVersion 2.0.9 (May 7, 1999) set at default parameters. Such a pair ofpolypeptides may show, for example, at least 50%, at least 60%, at least70%, at least 80%, at least 90%, at least 91%, at least 92%, at least93%, at least 94%, at least 95%, at least 96%, at least 97%, at least98%, or at least 99% or greater sequence identity over a certain definedlength of one of the polypeptides.

The Invention

The invention is based on the discovery of new human proteases (PRTS),the polynucleotides encoding PRTS, and the use of these compositions forthe diagnosis, treatment, or prevention of gastrointestinal,cardiovascular, autoimmune/inflammatory, cell proliferative,developmental, epithelial, neurological, and reproductive disorders.

Table 1 summarizes the nomenclature for the full length polynucleotideand polypeptide sequences of the invention. Each polynucleotide and itscorresponding polypeptide are correlated to a single Incyte projectidentification number (Incyte Project ID). Each polypeptide sequence isdenoted by both a polypeptide sequence identification number(Polypeptide SEQ ID NO:) and an Incyte polypeptide sequence number(Incyte Polypeptide ID) as shown. Each polynucleotide sequence isdenoted by both a polynucleotide sequence identification number(Polynucleotide SEQ ID NO:) and an Incyte polynucleotide consensussequence number (Incyte Polynucleotide ID) as shown.

Table 2 shows sequences with homology to the polypeptides of theinvention as identified by BLAST analysis against the GenBank protein(genpept) database. Columns 1 and 2 show the polypeptide sequenceidentification number (Polypeptide SEQ ID NO:) and the correspondingIncyte polypeptide sequence number (Incyte Polypeptide ID) forpolypeptides of the invention Column 3 shows the GenBank identificationnumber (Genbank ID NO:) of the nearest GenBank homolog. Column 4 showsthe probability score for the match between each polypeptide and itsGenBank homolog. Column 5 shows the annotation of the GenBank homologalong with relevant citations where applicable, all of which areexpressly incorporated by reference herein.

Table 3 shows various structural features of the polypeptides of theinvention. Columns 1 and 2 show the polypeptide sequence identificationnumber (SEQ ID NO:) and the corresponding Incyte polypeptide sequencenumber (Incyte Polypeptide ID) for each polypeptide of the invention.Column 3 shows the number of amino acid residues in each polypeptide.Column 4 shows potential phosphorylation sites, and column 5 showspotential glycosylation sites, as determined by the MOTIFS program ofthe GCG sequence analysis software package (Genetics Computer Group,Madison Wis.). Column 6 shows amino acid residues comprising signaturesequences, domains, and motifs. Column 7 shows analytical methods forprotein structure/function analysis and in some cases, searchabledatabases to which the analytical methods were applied.

Together, Tables 2 and 3 summarize the properties of polypeptides of theinvention, and these properties establish that the claimed polypeptidesare proteases. For example, SEQ ID NO:1 is 89% identical to a humanpreprocathepsin L precursor (GenBank ID g190418) as determined by theBasic Local Alignment Search Tool (BLAST). (See Table 2.) The BLASTprobability score is 4.5e−169, which indicates the probability ofobtaining the observed polypeptide sequence alignment by chance. SEQ IDNO:1 also contains a papain family cysteine protease active site domainas determined by searching for statistically significant matches in thehidden Markov model (HMM)-based PFAM database of conserved proteinfamily domains. (See Table 3.) The presence of this motif is confirmedby BLIMPS, MOTIFS, and PROFILESCAN analyses, providing furthercorroborative evidence that SEQ ID NO:1 is a cysteine protease of thepapain family. In an alternative example, SEQ ID NO:6 has 44% localidentity to Xenopus ovochymase, a polyprotease of the trypsin family(GenBank ID g2981641), as determined by the Basic Local Alignment SearchTool (BLAST). (See Table 2.) The BLAST probability score is 6.4e−201,which indicates the probability of obtaining the observed polypeptidesequence alignment by chance. SEQ ID NO:6 contains a number of proteaseactive site domains as determined by searching for statisticallysignificant matches in the hidden Markov model (HMM)-based PFAM databaseof conserved protein family domains. (See Table 3.) The presence ofthese motifs is confirmed by BLIMPS, MOTIFS, and PROFILESCAN analyses.These analyses also reveal the presence of kringle and CUB domains, aswell as a signal peptide. Together, these data provide furthercorroborative evidence that SEQ ID NO:6 is a serine protease of thetrypsin family. In an alternative example, SEQ ID NO:10 is 50% identicalto a human ubiquitin-specific processing protease (GenBank ID g6941888)as determined by the Basic Local Alignment Search Tool (BLAST). (SeeTable 2.) The BLAST probability score is 7.5e−273, which indicates theprobability of obtaining the observed polypeptide sequence alignment bychance. SEQ ID NO:10 is also 51% identical to a murineubiquitin-specific processing protease (GenBank ID g6941890) asdetermined by the BLAST analysis with a probability score of 4.0e−271.SEQ ID NO:10 also contains ubiquitin carboxyl-terminal hydrolase (i.e.,ubiquitin-specific protease) domains as determined by searching forstatistically significant matches in the hidden Markov model (HMM)-basedPFAM database of conserved protein family domains. (See Table 3.) Datafrom BLIMPS and MOTIFS analyses provide further corroborative evidencethat SEQ ID NO:10 is a ubiquitin-specific protease. In an alternativeexample, SEQ ID NO:16 has 52% local identity to Xenopus ADAM13 (GenBankID g1916617) as determined by the Basic Local Alignment Search Tool(13LAST). (See Table 2.) The BLAST probability score is 1.4e−198, whichindicates the probability of obtaining the observed polypeptide sequencealignment by chance. SEQ ID NO:16 contains a reprolysin family neutralzinc protease active site domain, a reprolysin family propeptide, and adisintegrin domain signature as determined by searching forstatistically significant matches in the hidden Markov model (HMM)-basedPFAM database of conserved protein family domains. (See Table 3.) Thepresence of these domains is confirmed by BLIMPS, MOTIFS, andPROFILESCAN analyses, providing further corroborative evidence that SEQID NO:16 is a metalloprotease of the ADAM family. In an alternativeexample, SEQ ID NO:17 is 30% identical to the human zinc metalloproteaseADAMTS6 (GenBank ID g5923786) as determined by CLUSTAL V analysis, and44% local identity, as determined by the Basic Local Alignment SearchTool (BLAST). (See Table 2.) The BLAST probability score is 9.1e−164,which indicates the probability of obtaining the observed polypeptidesequence alignment by chance. SEQ ID NO:17 also contains a zincmetalloprotease active site domain, a reprolysin family metalloproteasepropeptide, and a type I thrombospondin domain as determined bysearching for statistically significant matches in the hidden Markovmodel (HMM)-based PFAM database of conserved protein family domains.(See Table 3.) Data from BLIMPS analysis provide further corroborativeevidence that SEQ ID NO:17 is a metalloprotease of the ADAMTS family.SEQ ID NO:2-5, SEQ ID NO:7-9, and SEQ ID NO: 11-15 were analyzed andannotated in a similar manner. The algorithms and parameters for theanalysis of SEQ ID NO:1-17 are described in Table 7.

As shown in Table 4, the full length polynucleotide sequences of thepresent invention were assembled using cDNA sequences or coding (exon)sequences derived from genomic DNA, or any combination of these twotypes of sequences. Columns 1 and 2 list the polynucleotide sequenceidentification number (Polynucleotide SEQ ID NO:) and the correspondingIncyte polynucleotide consensus sequence number (Incyte PolynucleotideID) for each polynucleotide of the invention. Column 3 shows the lengthof each polynucleotide sequence in basepairs. Column 4 lists fragmentsof the polynucleotide sequences which are useful, for example, inhybridization or amplification technologies that identify SEQ IDNO:18-34 or that distinguish between SEQ ID NO:18-34 and relatedpolynucleotide sequences. Column 5 shows identification numberscorresponding to cDNA sequences, coding sequences (exons) predicted fromgenomic DNA, and/or sequence assemblages comprised of both cDNA andgenomic DNA. These sequences were used to assemble the full lengthpolynucleotide sequences of the invention. Columns 6 and 7 of Table 4show the nucleotide start (5′) and stop (3′) positions of the cDNAand/or genomic sequences in column 5 relative to their respective fulllength sequences.

The identification numbers in Column 5 of Table 4 may referspecifically, for example, to Incyte cDNAs along with theircorresponding cDNA libraries. For example, 6917460H1 is theidentification number of an Incyte cDNA sequence, and PLACFER06 is thecDNA library from which it is derived. Incyte cDNAs for which cDNAlibraries are not indicated were derived from pooled cDNA libraries(e.g., 72004319V1). Alternatively, the identification numbers in column5 may refer to GenBak cDNAs or ESTs (e.g., g1365166) which contributedto the assembly of the fall length polynucleotide sequences. Inaddition, the identification numbers in column 5 may identify sequencesderived from the ENSEMBL (The Sanger Centre, Cambridge, UK) database(i.e., those sequences including the designation “ENST”). Alternatively,the identification numbers in column 5 may be derived from the NCBIRefSeq Nucleotide Sequence Records Database (ie., those sequencesincluding the designation “NM” or “NT”) or the NCBI RefSeq ProteinSequence Records (i.e., those sequences including the designation “NP”).Alternatively, the identification numbers in column 5 may refer toassemblages of both cDNA and Genscan-predicted exons brought together byan “exon stitching” algorithm. For example,FL_XXXXXX_N_(1—)N_(2—)YYYYY_N_(3—)N₄ represents a “stitched” sequence inwhich XXXXXX is the identification number of the cluster of sequences towhich the algorithm was applied, and YYYYY is the number of theprediction generated by the algorithm, and N_(1, 2, 3 . . .) , ifpresent, represent specific exons that may have been manually editedduring analysis (See Example V). Alternatively, the identificationnumbers in column 5 may refer to assemblages of exons brought togetherby an “exon-stretching” algorithm. For example,FLXXXXXX_gAAAAA_gBBBBB_(—)1_N is the identification number of a“stretched” sequence, XXXXXX being the Incyte project identificationnumber, gAAAAA being the GenBank identification number of the humangenomic sequence to which the “exon-stretching” algorithm was applied,gBBBBB being the GenBank identification number or NCBI RefSeqidentification number of the nearest GenBank protein homolog, and Nreferring to specific exons (See Example V). In instances where a RefSeqsequence was used as a protein homolog for the “exon-stretching”algorithm, a RefSeq identifier (denoted by “NM” “NP,” or “NT”) may beused in place of the GenBank identifier (i.e., gBBBBB).

Alternatively, a prefix identifies component sequences that werehand-edited, predicted from genomic DNA sequences, or derived from acombination of sequence analysis methods. The following Table listsexamples of component sequence prefixes and corresponding sequenceanalysis methods associated with the prefixes (see Example IV andExample V).

Prefix Type of analysis and/or examples of programs GNN, GFG, Exonprediction from genomic sequences using, for ENST example, GENSCAN(Stanford University, CA, USA) or FGENES (Computer Genomics Group, TheSanger Centre, Cambridge, UK). GBI Hand-edited analysis of genomicsequences. FL Stitched or stretched genomic sequences (see Example V).INCY Full length transcript and exon prediction from mapping of ESTsequences to the genome. Genomic location and EST composition data arecombined to predict the exons and resulting transcript.

In some cases, Incyte cDNA coverage redundant with the sequence coverageshown in column 5 was obtained to confirm the final consensuspolynucleotide sequence, but the relevant Incyte cDNA identificationnumbers are not shown.

Table 5 shows the representative cDNA libraries for those full lengthpolynucleotide sequences which were assembled using Incyte cDNAsequences. The representative cDNA library is the Incyte cDNA librarywhich is most frequently represented by the Incyte cDNA sequences whichwere used to assemble and confirm the above polynucleotide sequences.The tissues and vectors which were used to construct the cDNA librariesshown in Table 5 are described in Table 6.

The invention also encompasses PRTS variants. A preferred PRTS variantis one which has at least about 80%, or alternatively at least about90%, or even at least about 95% amino acid sequence identity to the PRTSamino acid sequence, and which contains at least one functional orstructural characteristic of PRTS.

The invention also encompasses polynucleotides which encode PRTS. In aparticular embodiment, the invention encompasses a polynucleotidesequence comprising a sequence selected from the group consisting of SEQID NO:18-34, which encodes PRTS. The polynucleotide sequences of SEQ IDNO:18-34, as presented in the Sequence Listing, embrace the equivalentRNA sequences, wherein occurrences of the nitrogenous base thymine arereplaced with uracil, and the sugar backbone is composed of nooseinstead of deoxyribose.

The invention also encompasses a variant of a polynucleotide sequenceencoding PRTS. In particular, such a variant polynucleotide sequencewill have at least about 70%, or alternatively at least about 85%, oreven at least about 95% polynucleotide sequence identity to thepolynucleotide sequence encoding PRTS. A particular aspect of theinvention encompasses a variant of a polynucleotide sequence comprisinga sequence selected from the group consisting of SEQ ID NO:18-34 whichhas at least about 70%, or alternatively at least about 85%, or even atleast about 95% polynucleotide sequence identity to a nucleic acidsequence selected from the group consisting of SEQ ID NO:18-34. Any oneof the polynucleotide variants described above can encode an amino acidsequence which contains at least one functional or structuralcharacteristic of PRTS.

It will be appreciated by those skilled in the art that as a result ofthe degeneracy of the genetic code, a multitude of polynucleotidesequences encoding PRTS, some bearing minimal similarity to thepolynucleotide sequences of any known and naturally occurring gene, maybe produced. Thus, the invention contemplates each and every possiblevariation of polynucleotide sequence that could be made by selectingcombinations based on possible codon choices. These combinations aremade in accordance with the standard triplet genetic code as applied tothe polynucleotide sequence of naturally occurring PRTS, and all suchvariations are to be considered as being specifically disclosed.

Although nucleotide sequences which encode PRTS and its variants aregenerally capable of hybridizing to the nucleotide sequence of thenaturally occuring PRTS under appropriately selected conditions ofstringency, it may be advantageous to produce nucleotide sequencesencoding PRTS or its derivatives possessing a substantially differentcodon usage, e.g., inclusion of non-naturally occurring codons. Codonsmay be selected to increase the rate at which expression of the peptideoccurs in a particular prokaryotic or eukaryotic host in accordance withthe frequency with which particular codons are utilized by the host.Other reasons for substantially altering the nucleotide sequenceencoding PRTS and its derivatives without altering the encoded aminoacid sequences include the production of RNA transcripts having moredesirable properties, such as a greater half-life, than transcriptsproduced from the naturally occurring sequence.

The invention also encompasses production of DNA sequences which encodePRTS and PRTS derivatives, or fragments thereof, entirely by syntheticchemistry. After production, the synthetic sequence may be inserted intoany of the many available expression vectors and cell systems usingreagents well known in the art. Moreover, synthetic chemistry may beused to introduce mutations into a sequence encoding PRTS or anyfragment thereof.

Also encompassed by the invention are polynucleotide sequences that arecapable of hybridizing to the claimed polynucleotide sequences, and, inparticular, to those shown in SEQ ID NO:18-34 and fragments thereofunder various conditions of stringency. (See, e.g., Wahl G. M. and S. L.Berger (1987) Methods Enzymol. 152:399-407; Kimmel, A. R. (1987) MethodsEnzymol. 1521:507-511.) Hybridization conditions, including annealingand wash conditions, are described in “Definitions.”

Methods for DNA sequencing are well known in the art and may be used topractice any of the embodiments of the invention. The methods may employsuch enzymes as the Klenow fragment of DNA polymerase I, SEQUENASE (USBiochemical, Cleveland Ohio), Taq polymerase (Applied Biosystems),thermostable T7 polymerase (Amersham Pharmacia Biotech, PiscatawayN.J.), or combinations of polymerases and proofreading exonucleases suchas those found in the ELONGASE amplification system (Life Technologies,Gaithersburg Md.). Preferably, sequence preparation is automated withmachines such as the MICROLAB 2200 liquid transfer system (Hamilton,Reno Nev.), PTC200 thermal cycler (MJ Research, Watertown Mass.) and ABICATALYST 800 thermal cycler (Applied Biosystems). Sequencing is thencarried out using either the ABI 373 or 377 DNA sequencing system(Applied Biosystems), the MEGABACE 1000 DNA sequencing system (MolecularDynamics, Sunnyvale Calif.), or other systems known in the art. Theresulting sequences are analyzed using a variety of algorithms which arewell known in the art. (See, e.g., Ausubel, F. M. (1997) Short Protocolsin Molecular Biology, John Wiley & Sons, New York N.Y., unit 7.7;Meyers, R. A. (1995) Molecular Biology and Biotechnology, Wiley VCH, NewYork N.Y., pp. 856-853.)

The nucleic acid sequences encoding PRTS may be extended utilizing apartial nucleotide sequence and employing various PCR-based methodsknown in the art to detect upstream sequences, such as promoters andregulatory elements. For example, one method which may be employed,restriction-site PCR, uses universal and nested primers to amplifyunknown sequence from genomic DNA within a cloning vector. (See, e.g.,Sarkar, G. (1993) PCR Methods Applic. 2:318-322.) Another method,inverse PCR, uses primers that extend in divergent directions to amplifyunknown sequence from a circularized template. The template is derivedfrom restriction fragments comprising a known genomic locus andsurrounding sequences. (See, e.g., Triglia, T. et al. (1988) NucleicAcids Res. 16:8186.) A third method, capture PCR, involves PCRamplification of DNA fragments adjacent to known sequences inhuman andyeast artificial chromosome DNA. (See, e.g., Lagerstrom, M. et al.(1991) PCR Methods Applic. 1:111-119.) In this method, multiplerestriction enzyme digestions and ligations may be used to insert anengineered double-stranded sequence into a region of unknown sequencebefore performing PCR. Other methods which may be used to retrieveunknown sequences are known in the art. (See, e.g., Parker, J. D. et al.(1991) Nucleic Acids Res. 19:3055-3060). Additionally, one may use PCR,nested primers, and PROMOTERFINDER libraries (Clontech, Palo AltoCalif.) to walk genomic DNA. This procedure avoids the need to screenlibraries and is useful in finding intron/exon junctions. For allPCR-based methods, primers may be designed using commercially availablesoftware, such as OLIGO 4.06 primer analysis software (NationalBiosciences, Plymouth Minn.) or another appropriate program, to be about22 to 30 nucleotides in length, to have a GC content of about 50% ormore, and to anneal to the template at temperatures of about 68° C. to72° C.

When screening for full length cDNAs, it is preferable to use librariesthat have been size-selected to include larger cDNAs. In addition,random-primed libraries, which often include sequences containing the 5′regions of genes, are preferable for situations in which an oligo d(T)library does not yield a full-length cDNA. Genomic libraries may beuseful for extension of sequence into 5′ non-transcribed regulatoryregions.

Capillary electrophoresis systems which are commercially available maybe used to analyze the size or confirm the nucleotide sequence ofsequencing or PCR products. In particular, capillary sequencing mayemploy flowable polymers for electrophoretic separation, four differentnucleotide-specific, laser-stimulated fluorescent dyes, and a chargecoupled device camera for detection of the emitted wavelengths.Output/light intensity may be converted to electrical signal usingappropriate software (e.g., GENOTYPER and SEQUENCE NAVIGATOR, AppliedBiosystems), and the entire process from loading of samples to computeranalysis and electronic data display may be computer controlled.Capillary electrophoresis is especially preferable for sequencing smallDNA fragments which may be present in limited amounts in a particularsample.

In another embodiment of the invention, polynucleotide sequences orfragments thereof which encode PRTS may be cloned in recombinant DNAmolecules that direct expression of PRTS, or fragments or functionalequivalents thereof, in appropriate host cells. Due to the inherentdegeneracy of the genetic code, other DNA sequences which encodesubstantially the same or a functionally equivalent amino acid sequencemay be produced and used to express PRTS.

The nucleotide sequences of the present invention can be engineeredusing methods generally known in the art in order to alter PRTS-encodingsequences for a variety of purposes including, but not limited to,modification of the cloning, processing, and/or expression of the geneproduct. DNA shuffling by random fragmentation and PCR reassembly ofgene fragments and synthetic oligonucleotides may be used to engineerthe nucleotide sequences. For example, oligonucleotide-mediatedsite-directed mutagenesis may be used to introduce mutations that createnew restriction sites, alter glycosylation patterns, change codonpreference, produce splice variants, and so forth.

The nucleotides of the present invention may be subjected to DNAshuffling techniques such as MOLECULARBREEDING (Maxygen Inc., SantaClara Calif.; described in U.S. Pat. No. 5,837,458; Chang, C.-C. et al.(1999) Nat. Biotechnol. 17:793-797; Christians, F. C. et al. (1999) Nat.Biotechnol. 17:259-264; and Crameri, A. et al. (1996) Nat. Biotechnol.14:315-319) to alter or improve the biological properties of PRTS, suchas its biological or enzymatic activity or its ability to bind to othermolecules or compounds. DNA shuffling is a process by which a library ofgene variants is produced using PCR-mediated recombination of genefragments. The library is then subjected to selection or screeningprocedures that identify those gene variants with the desiredproperties. These preferred variants may then be pooled and furthersubjected to recursive rounds of DNA shuffling and selection/screening.Thus, genetic diversity is created through “artificial” breeding andrapid molecular evolution. For example, fragments of a single genecontaining random point mutations may be recombined, screened, and thenreshuffled until the desired properties are optimized. Alternatively,fragments of a given gene may be recombined with fragments of homologousgenes in the same gene family, either from the same or differentspecies, thereby maximizing the genetic diversity of multiple naturallyoccurring genes in a directed and controllable manner.

In another embodiment, sequences encoding PRTS may be synthesized, inwhole or in part, using chemical methods well known in the art. (See,e.g., Caruthers, M. H. et al. (1980) Nucleic Acids Symp. Ser. 7:215-223;and Horn, T. et al. (1980) Nucleic Acids Symp. Ser. 7:225-232.)Alternatively, PRTS itself or a fragment thereof may be synthesizedusing chemical methods. For example, peptide synthesis can be performedusing various solution-phase or solid-phase techniques. (See, e.g.,Creighton, T. (1984) Proteins, Structures and Molecular Properties, W HFreeman, New York N.Y., pp. 55-60; and Roberge, J. Y. et al. (1995)Science 269:202-204.) Automated synthesis may be achieved using the ABI431A peptide synthesizer (Applied Biosystems). Additionally, the aminoacid sequence of PRTS, or any part thereof, may be altered during directsynthesis and/or combined with sequences from other proteins, or anypart thereof, to produce a variant polypeptide or a polypeptide having asequence of a naturally occurring polypeptide.

The peptide may be substantially purified by preparative highperformance liquid chromatography. (See, e.g., Chiez, R. M. and F. Z.Regnier (1990) Methods Enzymol. 182:392-421.) The composition of thesynthetic peptides may be confirmed by amino acid analysis or bysequencing. (See, e.g., Creighton, supra, pp. 29-53.)

In order to express a biologically active PRTS, the nucleotide sequencesencoding PRTS or derivatives thereof may be inserted into an appropriateexpression vector, i.e., a vector which contains the necessary elementsfor transcriptional and translational control of the inserted codingsequence in a suitable host. These elements include regulatorysequences, such as enhancers, constitutive and inducible promoters, and5′ and 3′ untranslated regions in the vector and in polynucleotidesequences encoding PRTS. Such elements may vary in their strength andspecificity. Specific initiation signals may also be used to achievemore efficient translation of sequences encoding PRTS. Such signalsinclude the ATG initiation codon and adjacent sequences, e.g. the Kozaksequence. In cases where sequences encoding PRTS and its initiationcodon and upstream regulatory sequences are inserted into theappropriate expression vector, no additional transcriptional ortranslational control signals may be needed. However, in cases whereonly coding sequence, or a fragment thereof, is inserted, exogenoustranslational control signals including an in-frame ATG initiation codonshould be provided by the vector. Exogenous translational elements andinitiation codons may be of various origins, both natural and synthetic.The efficiency of expression may be enhanced by the inclusion ofenhancers appropriate for the particular host cell system used. (See,e.g., Scharf, D. et al. (1994) Results Probl. Cell Differ. 20:125-162.)

Methods which are well known to those skilled in the art may be used toconstruct expression vectors containing sequences encoding PRTS andappropriate transcriptional and translational control elements. Thesemethods include in vitro recombinant DNA techniques, synthetictechniques, and in vivo genetic recombination. (See, e.g., Sambrook, J.et al. (1989) Molecular Cloning, A Laboratory Manual, Cold Spring HarborPress, Plainview N.Y., ch. 4, 8, and 16-17; Ausubel, F. M. et al. (1995)Current Protocols in Molecular Biology, John Wiley & Sons, New YorkN.Y., ch. 9, 13, and 16.)

A variety of expression vector/host systems may be utilized to containand express sequences encoding PRTS. These include, but are not limitedto, microorganisms such as bacteria transformed with recombinantbacteriophage, plasmid, or cosmid DNA expression vectors; yeasttransformed with yeast expression vectors; insect cell systems infectedwith viral expression vectors (e.g., baculovirus); plant cell systemstransformed with viral expression vectors (e.g., cauliflower mosaicvirus, CaMV, or tobacco mosaic virus, TMV) or with bacterial expressionvectors (e.g., Ti or pBR322 plasmids); or animal cell systems. (See,e.g., Sambrook, supra; Ausubel, supra; Van Heeke, G. and S. M. Schuster(1989) J. Biol. Chem. 264:5503-5509; Engelhard, E. K. et al. (1994)Proc. Natl. Acad. Sci USA 91:3224-3227; Sandig, V. et al. (1996) Hum.Gene Ther. 7:1937-1945; Takamatsu, N. (1987) EMBO J. 6:307-311; TheMcGraw Hill Yearbook of Science and Technology (1992) McGraw Hill, NewYork N.Y., pp. 191-196; Logan, J. and T. Shenk (1984) Proc. Natl. Acad.Sci USA 81:3655-3659; and Harrington, J. J. et al. (1997) Nat. Genet.15:345-355.) Expression vectors derived from retroviruses, adenoviruses,or herpes or vaccinia viruses, or from various bacterial plasmids, maybe used for delivery of nucleotide sequences to the targeted organ,tissue, or cell population. (See, e.g., Di Nicola, M. et al. (1998)Cancer Gen. Ther. 5(6):350-356; Yu, M. et al. (1993) Proc. Natl Acad.Sci. USA 90(13):6340-6344; Buller, R. M. et al. (1985) Nature317(6040):813-815; McGregor, D. P. et al. (1994) Mol. Immunol.31(3):219-226; and Verma, J. M. and N. Somia (1997) Nature 389:239-242.)The invention is not limited by the host cell employed.

In bacterial systems, a number of cloning and expression vectors may beselected depending upon the use intended for polynucleotide sequencesencoding PRTS. For example, routine cloning, subcloning, and propagationof polynucleotide sequences encoding PRTS can be achieved using amultifunctional E. coli vector such as PBLUESCRIPT (Stratagene, La JollaCalif.) or PSPORT1 plasmnid (Life Technologies). Ligation of sequencesencoding PRTS into the vector's multiple cloning site disrupts the lacZgene, allowing a colorimetric screening procedure for identification oftransformed bacteria containing recombinant molecules. In addition,these vectors may be useful for in vitro transcription, dideoysequencing, single strand rescue with helper phage, and creation ofnested deletions in the cloned sequence. (See, e.g., Van Heeke, G. andS. M. Schuster (1989) J. Biol. Chem. 264:5503-5509.) When largequantities of PRTS are needed, e.g. for the production of anti-bodies,vectors which direct high level expression of PRTS may be used. Forexample, vectors containing the strong, inducible SP6 or T7bacteriophage promoter may be used.

Yeast expression systems may be used for production of PRTS. A number ofvectors containing constitutive or inducible promoters, such as alphafactor, alcohol oxidase, and PGH promoters, may be used in the yeastSaccharomyces cerevisiae or Pichia pastoris. In addition, such vectorsdirect either the secretion or intracellular retention of expressedproteins and enable integration of foreign sequences into the hostgenome for stable propagation. (See, e.g., Ausubel 1995, supra; Bitter,G. A. et al. (1987) Methods Enzymol. 153:516-544; and Scorer, C. A. etal. (1994) Bio/Technology 12:181-184.)

Plant systems may also be used for expression of PRTS. Transcription ofsequences encoding PRTS may be driven by viral promoters, e.g., the 35Sand 19S promoters of CaMV used alone or in combination with the omegaleader sequence from TMV (Takamatsu, N. (1987) EMBO J. 6:307-311).Alternatively, plant promoters such as the small subunit of RUBISCO orheat shock promoters may be used. (See, e.g., Coruzzi, G. et al. (1984)EMBO J. 3:1671-1680; Broglie, R. et al. (1984) Science 224:838-843; andWinter, J. et al. (1991) Results Probl. Cell Differ. 17:85-105.) Theseconstructs canbe introduced into plant cells by direct DNAtransformation or pathogen-mediated transfection. (See, e.g., The McGrawBill Yearbook of Science and Technology (1992) McGraw Hill, New YorkN.Y., pp. 191-196.)

In mammalian cells, a number of viral-based expression systems may beutilized. In cases where an adenovirus is used as an expression vector,sequences encoding PRTS may be ligated into an adenovirustranscription/translation complex consisting of the late promoter andtripartite leader sequence. Insertion in a non-essential E1 or E3 regionof the viral genome may be used to obtain infective virus whichexpresses PRTS in host cells. (See, e.g., Logan, J. and T. Shenk (1984)Proc. Natl. Acad. Sci. USA 81:3655-3659.) In addition, transcriptionenhancers, such as the Rous sarcoma virus (RSV) enhancer, may be used toincrease expression in mammalian host cells. SV40 or EBV-based vectorsway also be used for high-level protein expression.

Human artificial chromosomes (HACs) may also be employed to deliverlarger fragments of DNA than can be contained in and expressed from aplasmid. HACs of about 6 kb to 10 Mb are constructed and delivered viaconventional delivery methods (liposomes, polycationic amino polymers,or vesicles) for therapeutic purposes. (See, e.g., Harrington, J. J. etal. (1997) Nat. Genet. 15:345-355.)

For long term production of recombinant proteins in mammalian systems,stable expression of PRTS in cell lines is preferred. For example,sequences encoding PRTS can be transformed into cell lines usingexpression vectors which may contain viral origins of replication and/orendogenous expression elements and a selectable marker gene on the sameor on a separate vector. Following the introduction of the vector, cellsmay be allowed to grow for about 1 to 2 days in enriched media beforebeing switched to selective media. The purpose of the selectable markeris to confer resistance to a selective agent, and its presence allowsgrowth and recovery of cells which successfully express the introducedsequences. Resistant clones of stably transformed cells may bepropagated using tissue culture techniques appropriate to the cell type.

Any number of selection systems may be used to recover transformed celllines. These include, but are not limited to, the herpes simplex virusthymidine kinase and adenine phosphoribosyltransferase genes, for use intk and apr cells, respectively. (See, e.g., Wigler, M. et al. (1977)Cell 11:223-232; Lowy, I. et al. (1980) Cell 22:817-823.) Also,antimetabolite, antibiotic, or herbicide resistance can be used as thebasis for selection. For example, dhfr confers resistance tomethotrexate; neo confers resistance to the aminoglycosides neomycin andG418; and als and pat confer resistance to chlorsulfuron andphosphinotricin acetyltransferase, respectively. (See, e.g., Wigler, M.et al. (1980) Proc. Natl. Acad. Sci. USA 77:3567-3570; Colbere-Garapin,F. et al. (1981) J. Mol. Biol. 150:1-14.) Additional selectable geneshave been described, e.g., rpB and hisD, which alter cellularrequirements for metabolites. (See, e.g., Hartman, S. C. and R. C.Mulligan (1988) Proc. Natl. Acad. Sci. USA 85:8047-8051.) Visiblemarkers, e.g., anthocyanins, green fluorescent proteins (GFP; Clontech),β glucuronidase and its substrate β-glucuronide, or luciferase and itssubstrate luciferin may be used. These markers can be used not only toidentify transformants, but also to quantify the amount of transient orstable protein expression attributable to a specific vector system.(See, e.g., Rhodes, C. A. (1995) Methods Mol. Biol. 55:121-131.)

Although the presence/absence of marker gene expression suggests thatthe gene of interest is also present, the presence and expression of thegene may need to be confirmed. For example, if the sequence encodingPRTS is inserted within a marker gene sequence, transformed cellscontaining sequences encoding PRTS can be identified by the absence ofmarker gene function. Alternatively, a marker gene can be placed intandem with a sequence encoding PRTS under the control of a singlepromoter. Expression of the marker gene in response to induction orselection usually indicates expression of the tandem gene as well.

In general, host cells that contain the nucleic acid sequence encodingPRTS and that express PRTS may be identified by a variety of proceduresknown to those of skill in the art. These procedures include, but arenot limited to, DNA—DNA or DNA-RNA hybridizations, PCR amplification,and protein bioassay or immunoassay techniques which include membrane,solution, or chip based technologies for the detection and/orquantification of nucleic acid or protein sequences.

Immunological methods for detecting and measuring the expression of PRTSusing either specific polyclonal or monoclonal anti-bodies are known inthe art. Examples of such techniques include enzyme-linked immunosorbentassays (ELISAs), radioimmunoassays (RIAs), and fluorescence activatedcell sorting (FACS). A two-site, monoclonal-based immunoassay utilizingmonoclonal antibodies reactive to two non-interfering epitopes on PRTSis preferred, but a competitive binding assay may be employed. These andother assays are well known in the art. (See, e.g., Hampton, R. et al.(1990) Serological Methods, a Laboratory Manual, APS Press, St. PaulMinn., Sect. IV; Coligan, J. E. et al. (1997) Current Protocols inImmunology, Greene Pub. Associates and Wiley-Interscience, New YorkN.Y.; and Pound, J. D. (1998) Immunochemical Protocols, Humana Press,Totowa N.J.)

A wide variety of labels and conjugation techniques are known by thoseskilled in the art and may be used in various nucleic acid and aminoacid assays. Means for producing labeled hybridization or PCR probes fordetecting sequences related to polynucleotides encoding PRTS includeoligolabeling, nick translation, end-labeling, or PCR amplificationusing a labeled nucleotide. Alternatively, the sequences encoding PRTS,or any fragments thereof, may be cloned into a vector for the productionof an mRNA probe. Such vectors are known in the art, are commerciallyavailable, and may be used to synthesize RNA probes in vitro by additionof an appropriate RNA polymerase such as T7, T3, or SP6 and labelednucleotides. These procedures may be conducted using a variety ofcommercially available kits, such as those provided by AmershamPharmacia Biotech, Promega (Madison Wis.), and US Biochemical. Suitablereporter molecules or labels which may be used for ease of detectioninclude radionuclides, enzymes, fluorescent, chemiluminescent, orchromogenic agents, as well as substrates, cofactors, inlibitors,magnetic particles, and the like.

Host cells transformed with nucleotide sequences encoding PRTS may becultured under conditions suitable for the expression and recovery ofthe protein from cell culture. The protein produced by a transformedcell may be secreted or retained intracellularly depending on thesequence and/or the vector used. As will be understood by those of skillin the art, expression vectors containing polynucleotides which encodePRTS may be designed to contain signal sequences which direct secretionof PRTS through a prokaryotic or eukayotic cell membrane.

In addition, a host cell strain may be chosen for its ability tomodulate expression of the inserted sequences or to process theexpressed protein in the desired fashion. Such modifications of thepolypeptide include, but are not limited to, acetylation, carboxylation,glycosylation, phosphorylation lipidation, and acylation.Post-translational processing which cleaves a “prepro” or “pro” form ofthe protein may also be used to specify protein targeting, folding,and/or activity. Different host cells which have specific cellularmachinery and characteristic mechanisms for post-translationalactivities (e.g., CHO, HeLa, MDCK, HEK293, and WI38) are available fromthe American Type Culture Collection (ATCC, Manassas Va.) and may bechosen to ensure the correct modification and processing of the foreignprotein.

In another embodiment of the invention, natural, modified, orrecombinant nucleic acid sequences encoding PRTS may be ligated to aheterologous sequence resulting in translation of a fusion protein inany of the aforementioned host systems. For example, a chimeric PRTSprotein containing a heterologous moiety that can be recognized by acommercially available antibody may facilitate the screening of peptidelibraries for inhibitors of PRTS activity. Heterologous protein andpeptide moieties may also facilitate purification of fusion proteinsusing commercially available affinity matrices. Such moieties include,but are not limited to, glutathione S-transferase (GST), maltose bindingprotein (MBP), thioredoxin (Trx), calmodulin binding peptide (CBP),6-His, FLAG, c-myc, and hemagglutinin (HA). GST, MBP, Trx, CBP, and6-His enable purification of their cognate fusion proteins onimmobilized glutathione, maltose, phenylarsine oxide, calrnodulin, andmetal-chelate resins, respectively. FLAG, c-myc, and hemagglutinin (A)enable immunoaffinity purification of fusion proteins using commerciallyavailable monoclonal and polyclonal antibodies that specificallyrecognize these epitope tags. A fusion protein may also be engineered tocontain a proteolytic cleavage site located between the PRTS encodingsequence and the heterologous protein sequence, so that PRTS may becleaved away from the heterologous moiety following purification.Methods for fusion protein expression and purification are discussed inAusubel (1995, supra, ch. 10). A variety of commercially available kitsmay also be used to facilitate expression and purification of fusionproteins.

In a further embodiment of the invention, synthesis of radiolabeled PRTSmay be achieved in vitro using the TNT rabbit reticulocyte lysate orwheat germ extract system (Promega). These systems couple transcriptionand translation of protein-coding sequences operably associated with theT7, T3, or SP6 promoters. Translation takes place in the presence of aradiolabeled amino acid precursor, for example, ³⁵S-methionine.

PRTS of the present invention or fragments thereof may be used to screenfor compounds that specifically bind to PRTS. At least one and up to aplurality of test compounds may be screened for specific binding toPRTS. Examples of test compounds include antibodies, oligonucleotides,proteins (e.g., receptors), or small molecules.

In one embodiment, the compound thus identified is closely related tothe natural ligand of PRTS, e.g., a ligand or fragment thereof, anatural substrate, a structural or functional mimetic, or a naturalbinding partner. (See, e.g., Coligan, J. E. et al. (1991) CurrentProtocols in Immunology 1(2): Chapter 5.) Similarly, the compound can beclosely related to the natural receptor to which PRTS binds, or to atleast a fragment of the receptor, e.g., the ligand binding site. Ineither case, the compound can be rationally designed using knowntechniques. In one embodiment, screening for these compounds involvesproducing appropriate cells which express PRTS, either as a secretedprotein or on the cell membrane. Preferred cells include cells frommammals, yeast, Drosophila, or E. coli. Cells expressing PRTS or cellmembrane fractions which contain PRTS are then contacted with a testcompound and binding, stimulation, or inhibition of activity of eitherPRTS or the compound is analyzed.

An assay may simply test binding of a test compound to the polypeptide,wherein binding is detected by a fluorophore, radioisotope, enzymeconjugate, or other detectable label. For example, the assay maycomprise the steps of combing at least one test compound with PRTS,either in solution or affixed to a solid support, and detecting thebinding of PRTS to the compound. Alternatively, the assay may detect ormeasure binding of a test compound in the presence of a labeledcompetitor. Additionally, the assay may be carried out using cell-freepreparations, chemical libraries, or natural product mixtures, and thetest compound(s) may be free in solution or affixed to a solid support.

PRTS of the present invention or fragments thereof may be used to screenfor compounds that modulate the activity of PRTS. Such compounds mayinclude agonists, antagonists, or partial or inverse agonists. In oneembodiment, an assay is performed under conditions permissive for PRTSactivity, wherein PRTS is combined with at least one test compound, andthe activity of PRTS in the presence of a test compound is compared withthe activity of PRTS in the absence of the test compound. A change inthe activity of PRTS in the presence of the test compound is indicativeof a compound that modulates the activity of PRTS. Alternatively, a testcompound is combined with an in vitro or cell-free system comprisingPRTS under conditions suitable for PRTS activity, and the assay isperformed. In either of these assays, a test compound which modulatesthe activity of PRTS may do so indirectly and need not come in directcontact with the test compound. At least one and up to a plurality oftest compounds may be screened.

In another embodiment, polynucleotides encoding PRTS or their mammalianhomologs may be “knocked out” in an animal model system using homologousrecombination in embryonic stem (ES) cells. Such techniques are wellknown in the art and are useful for the generation of animal models ofhuman disease. (See, e.g., U.S. Pat. No. 5,175,383 and U.S. Pat. No.5,767,337.) For example, mouse ES cells, such as the mouse 129/SvJ cellline, are derived from the early mouse embryo and grown in culture. TheES cells are transformed with a vector containing the gene of interestdisrupted by a marker gene, e.g., the neomycin phosphotransferase gene(neo; Capecchi, M. R. (1989) Science 244:1288-1292). The vectorintegrates into the corresponding region of the host, genomebyhomologous recombination. Alternatively, homologous recombinationtakes place using the Cre-loxP system to knockout a gene of interest ina tissue- or developmental stage-specific manner (Marth, J. D. (1996)Clin. Invest. 97:1999-2002; Wagner, K. U. et al. (1997) Nucleic AcidsRes. 25:4323-4330). Transformed ES cells are identified andmicroinjected into mouse cell blastocysts such as those from the C57BL/6mouse strain. The blastocysts are surgically transferred topseudopregnant dams, and the resulting chimeric progeny are genotypedand bred to produce heterozygous or homozygous strains. Transgenicanimals thus generated may be tested with potential therapeutic or toxicagents.

Polynucleotides encoding PRTS may also be manipulated in vitro in EScells derived from human blastocysts. Human ES cells have the potentialto differentiate into at least eight separate cell lineages includingendoderm, mesoderm, and ectodermal cell types. These cell lineagesdifferentiate into, for example, neural cells, hematopoietic lineages,and cardiomyocytes (Thomson, J. A. et al. (1998) Science 282:1145-1147).

Polynucleotides encoding PRTS can also be used to create “iocin”humanized animals (pigs) or transgenic animals (mice or rats) to modelhuman disease. With knockin technology, a region of a polynucleotideencoding PRTS is injected into animal ES cells, and the injectedsequence integrates into the animal cell genome. Transformed cells areinjected into blastulae, and the blastulae are implanted as descnbedabove. Transgenic progeny or inbred lines are studied and treated withpotential pharmaceutical agents to obtain information on treatment of ahuman disease. Alternatively, a mammal inbred to overexpress PRTS, e.g.,by secreting PRTS in its milk, may also serve as a convenient source ofthat protein (Janne, J. et al. (1998) Biotechnol. Annu. Rev. 4:55-74).

Therapeutics

Chemical and structural similarity, e.g., in the context of sequencesand motifs, exists between regions of PRTS and proteases. In addition,the expression of PRTS is closely associated with digestive, lung,neurological, gastrointestinal cardiovascular, urinary, reproductive,fibroblastic, developmental, and endothelial tissues, and also prostatecancer and other tumorous tissue. Therefore, PRTS appears to play a rolein gastrointestinal, cardiovascular, autoimmune/inflammatory, cellproliferative, developmental, epithelial, neurological, and reproductivedisorders. In the treatment of disorders associated with increased PRTSexpression or activity, it is desirable to decrease the expression oractivity of PRTS. In the treatment of disorders associated withdecreased PRTS expression or activity, it is desirable to increase theexpression or activity of PRTS.

Therefore, in one embodiment, PRTS or a fragment or derivative thereofmay be administered to a subject to treat or prevent a disorderassociated with decreased expression or activity of PRTS. Examples ofsuch disorders include, but are not limited to, a gastrointestinaldisorder, such as dysphagia, peptic esophagitis, esophageal spasm,esophageal stricture, esophageal carcinoma, dyspepsia, indigestion,gastritis, gastric carcinoma, anorexia, nausea, emesis, gastroparesis,antral or pyloric edema, abdominal angina, pyrosis, gastroenteritis,intestinal obstruction, infections of the intestinal tract, pepticulcer, cholelithiasis, cholecystitis, cholestasis, pancreatitis,pancreatic carcinoma, biliary tract disease, hepatitis,hyperbilirubinemia, cirrhosis, passive congestion of the liver,hepatoma, infectious colitis, ulcerative colitis, ulcerative proctitis,Crohn's disease, Whipple's disease, Mallory-Weiss syndrome, coloniccarcinoma, colonic obstruction, irritable bowel syndrome, short bowelsyndrome, diarrhea, constipation, gastrointestinal hemorrhage, acquiredimmunodeficiency syndrome (AIDS) enteropathy, jaundice, hepaticencephalopathy, hepatorenal syndrome, hepatic steatosis,hemochromatosis, Wilson's disease, alpha₁-antitrypsin deficiency, Reye'ssyndrome, primary sclerosing cholangitis, liver infarction, portal veinobstruction and thrombosis, centrilobular necrosis, peliosis hepatis,hepatic vein thrombosis, veno-occlusive disease, preeclampsia,eclampsia, acute fatty liver of pregnancy, intrahepatic cholestasis ofpregnancy, and hepatic tumors including nodular hyperplasias, adenonias,and carcinomas; a cardiovascular disorder, such as arteriovenousfistula, atherosclerosis, hypertension, vasculitis, Raynaud's disease,aneurysms, arterial dissections, varicose veins, thrombophlebitis andphlebothrombosis, vascular tumors, and complications of thrombolysis,balloon angioplasty, vascular replacement, and coronary artery bypassgraft surgery, congestive heart failure, ischemic heart disease, anginapectoris, myocardial infarction, hypertensive heart disease,degenerative valvular heart disease, calcific aortic valve stenosis,congenitally bicuspid aortic valve, mitral annular calcification, mitralvalve prolapse, rheumatic fever and rheumatic heart disease, infectiveendocarditis, nonbacterial thrombotic endocarditis, endocarditis ofsystemic lupus erythematosus, carcinoid heart disease, cardiomyopathy,myocarditis, pericarditis, neoplastic heart disease, congenital heartdisease, and complications of cardiac transplantation; anautoimmune/inflammatory disorder, such as acquired immunodeficiencysyndrome (AIDS), Addison's disease, adult respiratory distress syndrome,allergies, ankylosing spondylitis, amyloidosis, anemia, asthma,atherosclerosis, atherosclerotic plaque rupture, autoimmune hemolyticanemia, autoimmune thyroiditis, autoimmunepolyendocrinopathy-candidiasis-ectodermal dystrophy (APECED),bronchitis, cholecystitis, contact dermatitis, Crohn's disease, atopicdermatitis, dermatomyositis, diabetes mellitus, emphysema, episodiclymphopenia with lymphocytotoxins, erythroblastosis fetalis, erythemanodosum, atrophic gastritis, glomerulonephritis, Goodpasture's syndrome,gout, Graves' disease, Hashimoto's thyroiditis, hypereosinophilia,irritable bowel syndrome, multiple sclerosis, myasthenia gravis,myocardial or pericardial inflammation, osteoarthritis, degradation ofarticular cartilage, osteoporosis, pancreatitis, polymyositis,psoriasis, Reiter's syndrome, rheumatoid arthritis, scleroderma,Sjögren's syndrome, systemic anaphylaxis, systemic lupus erythematosus,systemic sclerosis, thrombocytopenic purpura, ulcerative colitis,uveitis, Werner syndrome, complications of cancer, hemodialysis, andextracorporeal circulation, viral, bacterial, fungal, parasitic,protozoal, and helminthic infections, and trauma; a cell proliferativedisorder such as actinic keratosis, arteriosclerosis, atherosclerosis,bursitis, cirrhosis, hepatitis, mixed connective tissue disease (MCTD),myelofibrosis, paroxysmal nocturnal hemoglobinuria, polycythemia vera,psoriasis, primary thrombocythemia, and cancers includingadenocarcinoma, leukemia, lymphoma, melanoma, myeloma, sarcoma,teratocarcinoma, and, in particular, cancers of the adrenal gland,bladder, bone, bone marrow, brain, breast, cervix, gall bladder,ganglia, gastrointestinal tract, heart, kidney, liver, lung, muscle,ovary, pancreas, parathyroid, penis, prostate, salivary glands, skin,spleen, testis, thymus, thyroid, and uterus; a developmental disorder,such as renal tubular acidosis, anemia, Cushing's syndrome,achondroplastic dwarfism, Duchenne and Becker muscular dystrophy, boneresorption, epilepsy, gonadal dysgenesis, WAGR syndrome (Wilms' tumor,aniridia, genitourinary abnormalities, and mental retardation),Smith-Magenis syndrome, myelodysplastic syndrome, hereditarymucoepithelial dysplasia, hereditary keratodermas, hereditarynueuropathies such as Charcot-Marie-Tooth disease and neurofibromatosis,hypothyroidism, hydrocephalus, seizure disorders such as Syndenham'schorea and cerebral palsy, spina bifida, anencephaly,craniorachischisis, congenital glaucoma, cataract, age-related maculardegeneration, and sensorineural hearing loss; an epithelial disorder,such as dyshidrotic eczema, allergic contact dermatitis, keratosispilaris, melasma, vitiligo, actinic keratosis, basal cell carcinoma,squamous cell carcinoma, seborrheic keratosis, folliculitis, herpessimplex, herpes zoster, varicella, candidiasis, dermatophytosis,scabies, insect bites, cherry angioma, keloid, dermatofibroma,acrochordons, urticaria, transient acantholytic dermatosis, xerosis,eczema, atopic dermatitis, contact dermatitis, hand eczema, nummulareczema, lichen simplex chronicus, asteatotic eczema, stasis dermatitisand stasis ulceration, seborrheic dermatitis, psoriasis, lichen planus,pityriasis rosea, impetigo, ecthyma, dermatophytosis, tinea versicolor,warts, acne vulgaris, acne rosacea, pemphigus vulgaris, pemphigusfoliaceus, paraneoplastic pemphigus, bullous pemphigoid, herpesgestationis, dermatitis herpetiformis, linear IgA disease, epidermolysisbullosa acquisita, dermatomyositis, lupus erythematosus, scleroderma andmorphea, erythroderma, alopecia, figurate skin lesions, telangiectasias,hypopigmentation, hyperpigmentation, vesicles/bullae, exanthems,cutaneous drug reactions, papulonodular skin lesions, chronicnon-healing wounds, photosensitivity diseases, epidermolysis bullosasimplex, epidermolytic hyperkeratosis, epidernolytic andnonepideimolytic palmoplantar keratoderma, ichthyosisbullosa of Siemens,ichthyosis exfoliativa, keratosis palmaris et plantaris, keratosispalmoplantaris, palmoplantar keratoderma, keratosis punctata, Meesmann'scorneal dystrophy, pachyonychia congenita, white sponge nevus,steatocystoma multiplex, epidermal nevi/epidermolytic hyperleratosistype, monilethrix, trichothiodystrophy, chronic hepatitis/cryptogeniccirrhosis, and colorectal hyperplasia; a neurological disorder, such asepilepsy, ischemic cerebrovascular disease, stroke, cerebral neoplasms,Alzheimer's disease, Pick's disease, Huntington's disease, dementia,Parkinson's disease and other extrapyramidal disorders, amyotrophiclateral sclerosis and other motor neuron disorders, progressive neuralmuscular atrophy, retinitis pigmentosa, hereditary ataxias, multiplesclerosis and other demyelinating diseases, bacterial and viralmeningitis, brain abscess, subdural empyema, epidural abscess,suppurative intracranial thrombophlebitis, myelitis and radiculitis,viral central nervous system disease, prion diseases including kuru,Creutzfeldt-Jakob disease, and Gerstmann-Straussler-Scheinker syndrome,fatal familial insomnia, nutritional and metabolic diseases of thenervous system, neurofibromatosis, tuberous sclerosis, cerebelloretinalhemangioblastomatosis, encephalotrigeminal syndrome, mental retardationand other developmental disorders of the central nervous systemincluding Down syndrome, cerebral palsy, neuroskeletal disorders,autonomic nervous system disorders, cranial nerve disorders, spinal corddiseases, muscular dystrophy and other neuromuscular disorders,peripheral nervous system disorders, dermatomyositis and polymyositis,inherited, metabolic, endocrine, and toxic myopathies, myastheniagravis, periodic paralysis, mental disorders including mood, anxiety,and schizophrenic disorders, seasonal affective disorder (SAD),akathesia, amnesia, catatonia, diabetic neuropathy, tardive dyskinesia,dystonias, paranoid psychoses, postherpetic neuralgia, Tourette'sdisorder, progressive supranuclear palsy, corticobasal degeneration, andfamilial frontotemporal dementia; and a reproductive disorder, such asinfertility, including tubal disease, ovulatory defects, andendometriosis, a disorder of prolactin production, a disruption of theestrous cycle, a disruption of the menstrual cycle, polycystic ovarysyndrome, ovarian hyperstimulation syndrome, an endometrial or ovariantumor, a uterine fibroid, autoimmune disorders, an ectopic pregnancy,and teratogenesis; cancer of the breast, fibrocystic breast disease, andgalactorrhea; a disruption of spermatogenesis, abnormal spermphysiology, cancer of the testis, cancer of the prostate, benignprostatic hyperplasia, prostatitis, Peyronie's disease, impotence,carcinoma of the male breast, and gynecomastia.

In another embodiment, a vector capable of expressing PRTS or a fragmentor derivative thereof may be administered to a subject to treat orprevent a disorder associated with decreased expression or activity ofPRTS including, but not limited to, those described above.

In a further embodiment, a composition comprising a substantiallypurified PRTS in conjunction with a suitable pharmaceutical carrier maybe administered to a subject to treat or prevent a disorder associatedwith decreased expression or activity of PRTS including, but not limitedto, those provided above.

In still another embodiment, an agonist which modulates the activity ofPRTS may be administered to a subject to treat or prevent a disorderassociated with decreased expression or activity of PRTS including, butnot limited to, those listed above.

In a further embodiment, an antagonist of PRTS may be administered to asubject to treat or prevent a disorder associated with increasedexpression or activity of PRTS. Examples of such disorders include, butare not limited to, those gastrointestinal, cardiovascular,autoimmune/inflammatory, cell proliferative, developmental, epithelial,neurological, and reproductive disorders described above. In one aspect,an antibody which specifically binds PRTS may be used directly as anantagonist or indirectly as a targeting or delivery mechanism forbringing a pharmaceutical agent to cells or tissues which express PRTS.

In an additional embodiment, a vector expressing the complement of thepolynucleotide encoding PRTS may be administered to a subject to treator prevent a disorder associated with increased expression or activityof PRTS including, but not limited to, those described above.

In other embodiments, any of the proteins, antagonists, antibodies,agonists, complementary sequences, or vectors of the invention may beadministered in combination with other appropriate therapeutic agents.Selection of the appropriate agents for use in combination therapy maybe made by one of ordinary skill in the art, according to conventionalpharmaceutical principles. The combination of therapeutic agents may actsynergistically to effect the treatment or prevention of the variousdisorders described above. Using this approach, one may be able toachieve therapeutic efficacy with lower dosages of each agent, thusreducing the potential for adverse side effects.

An antagonist of PRTS may be produced using methods which are generallyknown in the art. In particular, purified PRTS may be used to produceantibodies or to screen libraries of pharmaceutical agents to identifythose which specifically bind PRTS. Antibodies to PRTS may also begenerated using methods that are well known in the art. Such antibodiesmay include, but are not limited to, polyclonal, monoclonal, chimeric,and single chain antibodies, Fab fragments, and fragments produced by aFab expression library. Neutralizing antibodies (i.e., those whichinhibit dimer formation) are generally preferred for therapeutic use.

For the production of antibodies, various hosts including goats,rabbits, rats, mice, humans, and others may be immunized by injectionwith PRTS or with any fragment or oligopeptide thereof which hasimmunogenic properties. Depending on the host species, various adjuvantsmay be used to increase immunological response. Such adjuvants include,but are not limited to, Freund's, mineral gels such as aluminumhydroxide, and surface active substances such as lysolecithin, pluronicpolyols, polyanions, peptides, oil emulsions, KLH, and dinitrophenol.Among adjuvants used in humans, BCG (bacilli Calmette-Guerin) andCorynebacterium parvum are especially preferable.

It is preferred that the oligopeptides, peptides, or fragments used toinduce antibodies to PRTS have an amino acid sequence consisting of atleast about 5 amino acids, and generally will consist of at least about10 amino acids. It is also preferable that these oligopeptides,peptides, or fragments are identical to a portion of the amino acidsequence of the natural protein. Short stretches of PRTS amino acids maybe fused with those of another protein, such as KLH, and antibodies tothe chimeric molecule may be produced.

Monoclonal antibodies to PRTS may be prepared using any technique whichprovides for the production of antibody molecules by continuous celllines in culture. These include, but are not limited to, the hybridomatechnique, the human B-cell hybridoma technique, and the EBV-hybridomatechnique. (See, e.g., Kobler, G. et al. (1975) Nature 256:495-497;Kozbor, D. et al. (1985) J. Immunol. Methods 81:31-42; Cote, R. J. etal. (1983) Proc. Natl. Acad. Sci. USA 80:2026-2030; and Cole, S. P. etal. (1984) Mol. Cell Biol. 62:109-120.)

In addition, techniques developed for the production of “chimericantibodies,” such as the splicing of mouse antibody genes to humanantibody genes to obtain a molecule with appropriate antigen specificityand biological activity, can be used. (See, e.g., Morrison, S. L. et al.(1984) Proc. Natl. Acad. Sci. USA 81:6851-6855; Neuberger, M. S. et al.(1984) Nature 312:604-608; and Takeda, S. et al. (1985) Nature314:452-454.) Alternatively, techniques described for the production ofsingle chain antibodies may be adapted, using methods known in the art,to produce PRTS-specific single chain antibodies. Antibodies withrelated specificity, but of distinct idiotypic composition, may begenerated by chain shuffling from random combinatorial-immuno globullibraries. (See, e.g., Burton, D. R. (1991) Proc. Natl. Acad. Sci. USA88:10134-10137.)

Antibodies may also be produced by inducing in vivo production in thelymphocyte population or by screening immunoglobulin libraries or panelsof highly specific binding reagents as disclosed in the literature.(See, e.g., Orlandi, R. et al. (1989) Proc. Natl. Acad. Sci. USA86:3833-3837; Winter, G. et al. (1991) Nature 349:293-299.)

Antibody fragments which contain specific binding sites for PRTS mayalso be generated. For example, such fragments include, but are notlimited to, F(ab′)₂ fragments produced by pepsin digestion of theantibody molecule and Fab fragments generated by reducing the disulfidebridges of the F(ab′)2 fragments. Alternatively, Fab expressionlibraries may be constructed to allow rapid and easy identification ofmonoclonal Fab fragments with the desired specificity. (See, e.g., Huse,W. D. et al. (1989) Science 246:1275-1281.)

Various immunoassays may be used for screening to identify antibodieshaving the desired specificity. Numerous protocols for competitivebinding or immunoradiometric assays using either polyclonal ormonoclonal antibodies with established specificities are well known inthe art. Such inmmunoassays typically involve the measurement of complexformation between PRTS and its specific antibody. A two-site,monoclonal-based immunoassays utilizing monoclonal antibodies reactiveto two non-interfering PRTS epitopes is generally used, but acompetitive binding assay may also be employed (Pound, supra).

Various methods such as Scatchard analysis in conjunction withradioimmunoassay techniques may be used to assess the affinity ofantibodies for PRTS. Affinity is expressed as an association constant,K_(a), which is defined as the molar concentration of PRTS-antibodycomplex divided by the molar concentrations of free antigen and freeantibody under equilibrium conditions. The K_(a) determined for apreparation of polyclonal antibodies, which are heterogeneous in theiraffinities for multiple PRTS epitopes, represents the average affinity,or avidity, of the antibodies for PRTS. The K_(a) determined for apreparation of monoclonal antibodies, which are monospecific for aparticular PRTS epitope, represents a true measure of affinity.High-affinity antibody preparations with K_(a) ranging from about 10⁹ to10¹² L/mole are preferred for use in immunoassays in which thePRTS-antibody complex must withstand rigorous manipulations.Low-affinity antibody preparations with K_(a) ranging from about 10⁶ to10⁷ L/mole are preferred for use in immunopurification and similarprocedures which ultimately require dissociation of PRTS, preferably inactive form, from the antibody (Catty, D. (1988) Antibodies, Volume I: APractical Approach, IRL Press, Washington D.C.; Liddell, J. E. and A.Cryer (1991) A Practical Guide to Monoclonal Antibodies, John Wiley &Sons, New York N.Y.).

The titer and avidity of polyclonal antibody preparations may be furtherevaluated to determine the quality and suitability of such preparationsfor certain downstream applications. For example, a polyclonal antibodypreparation containing at least 1-2 mg specific antibody/ml, preferably5-10 mg specific antibody/ml, is generally employed in proceduresrequiring precipitation of PRTS-antibody complexes. Procedures forevaluating antibody specificity, titer, and avidity, and guidelines forantibody quality and usage in various applications, are generallyavailable. (See, e.g., Catty, supra, and Coligan et al. supra.)

In another embodiment of the invention, the polynucleotides encodingPRTS, or any fragment or complement thereof, may be used for therapeuticpurposes. In one aspect, modifications of gene expression can beachieved by designing complementary sequences or antisense molecules(DNA, RNA, PNA, or modified oligonucleotides) to the coding orregulatory regions of the gene encoding PRTS. Such technology is wellknown in the art, and antisense oligonucleotides or larger fragments canbe designed from various locations along the coding or control regionsof sequences encoding PRTS. (See, e.g., Agrawal, S., ed. (1996)Antisense Therapeutics, Humana Press Inc., Totawa N.J.)

In therapeutic use, any gene delivery system suitable for introductionof the antisense sequences into appropriate target cells can be used.Antisense sequences can be delivered intracellularly in the form of anexpression plasmid which, upon transcription, produces a sequencecomplementary to at least a portion of the cellular sequence encodingthe target protein. (See, e.g., Slater, J. E. et al. (1998) J. AllergyClin. Immunol. 102(3):469-475; and Scanlon, K. J. et al. (1995)9(13):1288-1296.) Antisense sequences can also be introducedintracellularly through the use of viral vectors, such as retrovirus andadeno-associated virus vectors. (See, e.g., Miller, A. D. (1990) Blood76:271; Ausubel, supra; Uckert, W. and W. Walther (1994) Pharmacol.Ther. 63(3):323-347.) Other gene delivery mechanisms includeliposome-derived systems, artificial viral envelopes, and other systemsknown in the art. (See, e.g., Rossi, J. J. (1995) Br. Med. Bull.51(1):217-225; Boado, R. J. et al. (1998) J. Pharm. Sci.87(11):1308-1315; and Morris, M. C. et al. (1997) Nucleic Acids Res.25(14):2730-2736.)

In another embodiment of the invention, polynucleotides encoding PRTSmay be used for somatic or germline gene therapy. Gene therapy may beperformed to (i) correct a genetic deficiency (e.g., in the cases ofsevere combined immunodeficiency (SCID)-XI disease characterized byX-linked inheritance (Cavazzana-Calvo, M. et al. (2000) Science288:669-672), severe combined immunodeficiency syndrome associated withan inherited adenosine deaminase (ADA) deficiency (Blaese, R. M. et al.(1995) Science 270:475-480; Bordignon, C. et al. (1995) Science270:470-475), cystic fibrosis (Zabner, J. et al. (1993) Cell 75:207-216;Crystal, R. G. et al. (1995) Hum. Gene Therapy 6:643-666; Crystal, R. G.et al. (1995) Hum. Gene Therapy 6:667-703), thalassamias, familialhypercholesterolemia, and hemophilia resulting from Factor VIII orFactor IX deficiencies (Crystal, R. G. (1995) Science 270:404-410;Verma, I. M. and N. Somia (1997) Nature 389:239-242)), (ii) express aconditionally lethal gene product (e.g., in the case of cancers whichresult from unregulated cell proliferation), or (iii) express a proteinwhich affords protection against intracellular parasites (e.g., againsthuman retroviruses, such as human immunodeficiency virus (HIV)(Baltimore, D. (1988) Nature 335:395-396; Poeschla, E. et al. (1996)Proc. Natl. Acad. Sci. USA. 93:11395-11399), hepatitis B or C virus(HBV, HCV); fungal parasites, such as Candida albicans andParacoccidioides brasiliensis; and protozoan parasites such asPlasmodium falciparum and Trypanosoma cruzi). In the case where agenetic deficiency in PRTS expression or regulation causes disease, theexpression of PRTS from an appropriate population of transduced cellsmay alleviate the clinical manifestations caused by the geneticdeficiency.

In a further embodiment of the invention, diseases or disorders causedby deficiencies in PRTS are treated by constructing mammalian expressionvectors encoding PRTS and introducing these vectors by mechanical meansinto PRTS-deficient cells. Mechanical transfer technologies for use withcells in vivo or ex vitro include (i) direct DNA microinjection intoindividual cells, (ii) ballistic gold particle delivery, (iii)liposome-mediated transfection, (iv) receptor-mediated gene transfer,and (v) the use of DNA transposons (Morgan, R. A. and W. F. Anderson(1993) Annu. Rev. Biochem. 62:191-217; Ivics, Z. (1997) Cell 91:501-510;Boulay, J-L. and H. Récipon (1998) Curr. Opin. Biotechnol. 9:445-450).

Expression vectors that may be effective for the expression of PRTSinclude, but are not limited to, the PCDNA 3.1, EPITAG, PRCCMV2, PREP,PVAX, PCR2-TOPOTA vectors (Initrogen, Carlsbad Calif.), PCMV-SCRIPT,PCMV-TAG, PEGSH/PERV (Stratagene, La Jolla, Calif.), and PTET-OFF,PTET-ON, PTRE2, PTRE2-LUC, PTK-HYG (Clontech, Palo Alto Calif.). PRTSmay be expressed using (i) a constitutively active promoter, (e.g., fromcytomegalovims (CMV), Rous sarcoma virus (RSV), SV40 virus, thymidinekinase (TK), or β-actin genes), (ii) an inducible promoter (e.g., thetetracycline-regulated promoter (Gossen, M. and H Bujard (1992) Proc.Natl. Acac Sci. USA 89:5547-5551; Gossen, M. et al. (1995) Science268:1766-1769; Rossi, F. M. V. and H. M. Blau (1998) Curr. Opin.Biotechnol. 9:451-456), commercially available in the T-REX plasmid(Invitrogen)); the ecdysone-inducible promoter (available in theplasmids PVGRXR and PIND; Invitrogen); the FK506/rapamycin induciblepromoter; or the RU486/mifepristone inducible promoter (Rossi, F. M. V.and Blau, H. M. supra)), or (iii) a tissue-specific promoter or thenative promoter of the endogenous gene encoding PRTS from a normalindividual.

Commercially available liposome transformation kits (e.g., the PERFECTLIPID TRANSFECTION KIT, available from Invitrogen) allow one withordinary skill in the art to deliver polynucleotides to target cells inculture and require minimal effort to optimise experimental parameters.In the alternative, transformation is performed using the calciumphosphate method (Graham, P. L. and A. J. Eb (1973) Virology52:456-467), or by electroporation (Neumann, E. et al. (1982) EMBO J.1:841-845). The introduction of DNA to primary cells requiresmodification of these standardized mammalian transfection protocols.

In another embodiment of the invention, diseases or disorders caused bygenetic defects with respect to PRTS expression are treated byconstructing a retrovirus vector consisting of (i) the polynucleotideencoding PRTS under the control of an independent promoter or theretrovirus long terminal repeat (LTR) promoter, (ii) appropriate RNApackaging signals, and (iii) a Rev-responsive element (RRE) along withadditional retrovirus cis-acting RNA sequences and coding sequencesrequired for efficient vector propagation. Retrovirus vectors (e.g., PFBand PFBNEO) are commercially available (Stratagene) and are based onpublished data (Riviere, I. et al. (1995) Proc. Natl. Acad. Sci. USA92:6733-6737), incorporated by reference herein. The vector ispropagated in an appropriate vector producing cell line (VPCL) thatexpresses an envelope gene with a tropism for receptors on the targetcells or a promiscuous envelope protein such as VSVg (Armentano, D. etal. (1987) J. Virol. 61:1647-1650; Bender, M. A. et al. (1987) J. Virol.61:1639-1646; Adam, M. A. and A. D. Miller (1988) J. Virol.62:3802-3806; Dull, T. et al. (1998) J. Virol. 72:8463-8471; Zufferey,R. et al. (1998) J. Virol. 72:9873-9880). U.S. Pat. No. 5,910,434 toRigg (“Method for obtaining retrovirus packaging cell lines producinghigh transducing efficiency retroviral supernatant”) discloses a methodfor obtaining retrovirus packaging cell lines and is hereby incorporatedby reference. Propagation of retrovirus vectors, transduction of apopulation of cells (e.g., CD4⁺ T-cells), and the return of transducedcells to a patient are procedures well known to persons skilled in theart of gene therapy and have been well documented (Ranga, U. et al.(1997) J. Virol. 71:7020-7029; Bauer, G. et al. (1997) Blood89:2259-2267; Bonyhadi, M. L. (1997) J. Virol. 71:4707-4716; Ranga, U.et al. (1998) Proc. Natl. Acad. Sci. USA 95:1201-1206; Su, L. (1997)Blood 89:2283-2290).

In the alternative, an adenovirus-based gene therapy delivery system isused to deliver polynucleotides encoding PRTS to cells which have one ormore genetic abnormalities with respect to the expression of PRTS. Theconstruction and packaging of adenovirus-based vectors are well known tothose with ordinary skill in the art. Replication defective adenovirtsvectors have proven to be versatile for importing genes encodingimmunoregulatory proteins into intact islets in the pancreas (Csete, M.E. et al. (1995) Transplantation 27:263-268). Potentially usefuladenoviral vectors are described in U.S. Pat. No. 5,707,618 to Armentano(“Adenovirus vectors for gene therapy”), hereby incorporated byreference. For adenoviral vectors, see also Antinozzi, P. A. et al.(1999) Annu. Rev. Nutr. 19:511-544 and Verma, L. M. and N. Somia (1997)Nature 18:389:239-242, both incorporated by reference herein.

In another alternative, a herpes-based, gene therapy delivery system isused to deliver polynucleotides encoding PRTS to target cells which haveone or more genetic abnormalities with respect to the expression ofPRTS. The use of herpes simplex virus (HSV)-based vectors may beespecially valuable for introducing PRTS to cells of the central nervoussystem, for which HSV has a tropism. The construction and packaging ofherpes-based vectors are well known to those with ordinary skill in theart. A replication-competent herpes simplex virus (HSV) type 1-basedvector has been used to deliver a reporter gene to the eyes of primates(Liu, X. et al. (1999) Exp. Eye Res. 169:385-395). The construction of aHSV-1 virus vector has also been disclosed in detail in U.S. Pat. No.5,804,413 to DeLuca (“Herpes simplex virus strains for gene transfer”),which is hereby incorporated by reference. U.S. Pat. No. 5,804,413teaches the use of recombinant HSV d92 which consists of a genomecontaining at least one exogenous gene to be transferred to a cell underthe control of the appropriate promoter for purposes including humangene therapy. Also taught by this patent are the construction and use ofrecombinant HSV strains deleted for ICP4, ICP27 and ICP22. For HSVvectors, see also Goins, W. F. et al. (1999) J. Virol. 73:519-532 andXu, H. et al. (1994) Dev. Biol. 163:152-161, hereby incorporated byreference. The manipulation of cloned herpesvirus sequences, thegeneration of recombinant virus following the transfection of multipleplasmids containing different segments of the large herpesvirus genomes,the growth and propagation of herpesvirus, and the infection of cellswith herpesvirus are techniques well known to those of ordinary skill inthe art.

In another alternative, an alphavirus (positive, single-stranded RNAvirus) vector is used to deliver polynucleotides encoding PRTS to targetcells. The biology of the prototypic alphavirus, Semliki Forest Virus(SFV), has been studied extensively and gene transfer vectors have beenbased on the SFV genome (Garoff, H. and K.-J. Li (1998) Curr. Opin.Biotechnol. 9:464-469). During alphavirus RNA replication, a subgenomicRNA is generated that normally encodes the viral capsid proteins. Thissubgenomic RNA replicates to higher levels than the full length genomicRNA, resulting in the overproduction of capsid proteins relative to theviral proteins with enzymatic activity (e.g., protease and polymerase).Similarly, inserting the coding sequence for PRTS into the alphavirusgenome in place of the capsid-coding region results in the production ofa large number of PRTS-coding RNAs and the synthesis of high levels ofPRTS in vector transduced cells. While alphavirus infection is typicallyassociated with cell lysis within a few days, the ability to establish apersistent infection in hamster normal kidney cells (BHK-21) with avariant of Sindbis virus (SIN) indicates that the lytic replication ofalphaviruses can be altered to suit the needs of the gene therapyapplication (Dryga, S. A. et al. (1997) Virology 228:74-83). The widehost range of alphaviruses will allow the introduction of PRTS into avariety of cell types. The specific transduction of a subset of cells ina population may require the sorting of cells prior to transduction. Themethods of manipulating infectious cDNA clones of alphaviruses,performing alphavirus cDNA and RNA transfections, and performingalphavirus infections, are well known to those with ordinary skill inthe art.

Oligonucleotides derived from the transcription initiation site, e.g.,between about positions −10 and +10 from the start site, may also beemployed to inhibit gene expression. Similarly, inhibition can beachieved using triple helix base-pairing methodology. Triple helixpairing is useful because it causes inhibition of the ability of thedouble helix to open sufficiently for the binding of polymerases,transcription factors, or regulatory molecules. Recent therapeuticadvances using triplex DNA have been described in the literature. (See,e.g., Gee, J. E. et al. (1994) in Huber, B. E. and B. J. Carr, Molecularand Immunologic Approaches, Futura Publishing, Mt. Kisco N.Y., pp.163-177.) A complementary sequence or antisense molecule may also bedesigned to block translation of mRNA by preventing the transcript frombinding to ribosomes.

Ribozymes, enzymatic RNA molecules, may also be used to catalyze thespecific cleavage of RNA. The mechanism of ribozyme action involvessequence-specific hybridization of the ribozyme molecule tocomplementary target RNA, followed by endonucleolytic cleavage. Forexample, engineered hammerhead motif ribozyme molecules may specificallyand efficiently catalyze endonucleolytic cleavage of sequences encodingPRTS.

Specific ribozyme cleavage sites within any potential RNA target areinitially identified by scanning the target molecule for ribozymecleavage sites, including the following sequences: GUA, GUU, and GUC.Once identified, short RNA sequences of between 15 and 20ribonucleotides, corresponding to the region of the target genecontaining the cleavage site, may be evaluated for secondary structuralfeatures which may render the oligonucleotide inoperable. Thesuitability of candidate targets may also be evaluated by testingaccessibility to hybridization with complementary oligonucleotides usingribonuclease protection assays.

Complementary ribonucleic acid molecules and ribozymes of the inventionmay be prepared by any method known in the art for the synthesis ofnucleic acid molecules. These include techniques for chemicallysynthesizing oligonucleotides such as solid phase phosphoramiditechemical synthesis. Alternatively, RNA molecules may be generated by invitro and in vivo transcription of DNA sequences encoding PRTS. Such DNAsequences may be incorporated into a wide variety of vectors withsuitable RNA polymerase promoters such as T7 or SP6. Alternatively,these cDNA constructs that synthesize complementary RNA, constitutivelyor inducibly, can be introduced into cell lines, cells, or tissues.

RNA molecules may be modified to increase intracellular stability andhalf-life. Possible modifications include, but are not limited to, theaddition of flanking sequences at the 5′ and/or 3′ ends of the molecule,or the use of phosphorothioate or 2′ O-methyl rather thanphosphodiesterase linkages within the backbone of the molecule. Thisconcept is inherent in the production of PNAs and can be extended in allof these molecules by the inclusion of nontraditional bases such asinosine, queosine, and wybutosine, as well as acetyl-, methyl-, thio-,and similarly modified forms of adenine, cytidine, guanine, thymine, anduridine which are not as easily recognized by endogenous endonucleases.

An additional embodiment of the invention encompasses a method forscreening for a compound which is effective in altering expression of apolynucleotide encoding PRTS. Compounds which may be effective inaltering expression of a specific polynucleotide may include, but arenot limited to, oligonucleotides, antisense oligonucleotides, triplehelix-forming oligonucleotides, transcription factors and otherpolypeptide transcriptional regulators, and non-macromolecular chemicalentities which are capable of interacting with specific polynucleotidesequences. Effective compounds may alter polynucleotide expression byacting as either inhibitors or promoters of polynucleotide expression.Thus, in the treatment of disorders associated with increased PRTSexpression or activity, a compound which specifically inhibitsexpression of the polynucleotide encoding PRTS may be therapeuticallyuseful, and in the treatment of disorders associated with decreased PRTSexpression or activity, a compound which specifically promotesexpression of the polynucleotide encoding PRTS may be therapeuticallyuseful.

At least one, and up to a plurality, of test compounds may be screenedfor effectiveness in altering expression of a specific polynucleotide. Atest compound may be obtained by any method commonly known in the art,including chemical modification of a compound known to be effective inaltering polynucleotide expression; selection from an existing,commercially-available or proprietary library of naturally-occurring ornon-natural chemical compounds; rational design of a compound based onchemical and/or structural properties of the target polynucleotide; andselection from a library of chemical compounds created combinatoriallyor randomly. A sample comprising a polynucleotide encoding PRTS isexposed to at least one test compound thus obtained. The sample maycomprise, for example, an intact or permeabilized cell, or an in vitrocell-free or reconstituted biochemical system. Alterations in theexpression of a polynucleotide encoding PRTS are assayed by any methodcommonly known in the art. Typically, the expression of a specificnucleotide is detected by hybridization with a probe having a nucleotidesequence complementary to the sequence of the polynucleotide encodingPRTS. The amount of hybridization may be quantified, thus forming thebasis for a comparison of the expression of the polynucleotide both withand without exposure to one or more test compounds. Detection of achange in the expression of a polynucleotide exposed to a test compoundindicates that the test compound is effective in altering the expressionof the polynucleotide. A screen for a compound effective in alteringexpression of a specific polynucleotide can be carried out, for example,using a Schizosaccharomyces pombe gene expression system (Atkins, D. etal. (1999) U.S. Pat. No. 5,932,435; Arndt, G. M. et al. (2000) NucleicAcids Res. 28:E15) or a human cell line such as HeLa cell (Clarke, M. L.et al. (2000) Biochem. Biophys. Res. Commun. 268:8-13). A particularembodiment of the present invention involves screening a combinatoriallibrary of oligonucleotides (such as deoxyribonucleotides,ribonucleotides, peptide nucleic acids, and modified oligonucleotides)for antisense activity against a specific polynucleotide sequence(Bruice, T. W. et al. (1997) U.S. Pat. No. 5,686,242; Bruice, T. W. etal. (2000) U.S. Pat. No. 6,022,691).

Many methods for introducing vectors into cells or tissues are availableand equally suitable for use in vivo, in vitro, and ex vivo. For ex vivotherapy, vectors may be introduced into stem cells taken from thepatient and clonally propagated for autologous transplant back into thatsame patient. Delivery by transfection, by liposome injections, or bypolycationic amino polymers may be achieved using methods which are wellknown in the art. (See, e.g., Goldman, C. K. et al. (1997) Nat.Biotechnol. 15:462-466.)

Any of the therapeutic methods described above may be applied to anysubject in need of such therapy, including, for example, mammals such ashumans, dogs, cats, cows, horses, rabbits, and monkeys.

An additional embodiment of the invention relates to the administrationof a composition which generally comprises an active ingredientformulated with a pharmaceutically acceptable excipient. Excipients mayinclude, for example, sugars, starches, celluloses, gums, and proteins.Various formulations are commonly known and are thoroughly discussed inthe latest edition of Remington's Pharmaceutical Sciences (MaackPublishing, Easton Pa.). Such compositions may consist of PRTS,antibodies to PRTS, and mimetics, agonists, antagonists, or inhibitorsof PRTS.

The compositions utilized in this invention may be administered by anynumber of routes including, but not limited to, oral, intravenous,intramuscular, intra-arterial, intramedullary, intrathecal,intraventricular, pulmonary, transdermal, subcutaneous, intraperitoneal,intranasal, enteral, topical, sublingual, or rectal means.

Compositions for pulmonary administration may be prepared in liquid ordry powder form. These compositions are generally aerosolizedimmediately prior to inhalation by the patient. In the case of smallmolecules (e.g. traditional low molecular weight organic drugs), aerosoldelivery of fast-acting formulations is well-known in the art. In thecase of macromolecules (e.g. larger peptides and proteins), recentdevelopments in the field of pulmonary delivery via the alveolar regionof the lung have enabled the practical delivery of drugs such as insulinto blood circulation (see, e.g., Patton, J. S. et al., U.S. Pat. No.5,997,848). Pulmonary delivery has the advantage of administrationwithout needle injection, and obviates the need for potentially toxicpenetration enhancers.

Compositions suitable for use in the invention include compositionswherein the active ingredients are contained in an effective amount toachieve the intended purpose. The determination of an effective dose iswell within the capability of those skilled in the art.

Specialized forms of compositions may be prepared for directintracellular delivery of macromolecules comprising PRTS or fragmentsthereof. For example, liposome preparations containing acell-impermeable macromolecule may promote cell fusion and intracellulardelivery of the macromolecule. Alternatively, PRTS or a fragment thereofmay be joined to a short cationic N-terminal portion from the HIV Tat-1protein. Fusion proteins thus generated have been found to transduceinto the cells of all tissues, including the brain, in a mouse modelsystem (Schwarze, S. R. et al. (1999) Science 285:1569-1572).

For any compound, the therapeutically effective dose can be estimatedinitially either in cell culture assays, e.g., of neoplastic cells, orin animal models such as mice, rats, rabbits, dogs, monkeys, or pigs. Ananimal model may also be used to determine the appropriate concentrationrange and route of administration. Such information can then be used todetermine useful doses and routes for administration in humans.

A therapeutically effective dose refers to that amount of activeingredient, for example PRTS or fragments thereof, antibodies of PRTS,and agonists, antagonists or inhibitors of PRTS, which ameliorates thesymptoms or condition. Therapeutic efficacy and toxicity may bedetermined by standard pharmaceutical procedures in cell cultures orwith experimental animals, such as by calculating the ED₅₀ (the dosetherapeutically effective in 50% of the population) or LD₅₀ (the doselethal to 50% of the population) statistics. The dose ratio of toxic totherapeutic effects is the therapeutic index, which can be expressed asthe LD₅₀/ED₅₀ ratio. Compositions which exhibit large therapeuticindices are preferred. The data obtained from cell culture assays andanimal studies are used to formulate a range of dosage for human use.The dosage contained in such compositions is preferably within a rangeof circulating concentrations that includes the ED₅₀ with little or notoxicity. The dosage varies within this range depending upon the dosageform employed, the sensitivity of the patient, and the route ofadministration.

The exact dosage will be determined by the practitioner, in light offactors related to the subject requiring treatment. Dosage andadministration are adjusted to provide sufficient levels of the activemoiety or to maintain the desired effect Factors which may be taken intoaccount include the severity of the disease state, the general health ofthe subject, the age, weight, and gender of the subject, time andfrequency of administration, drug combination(s), reactionsensitivities, and response to therapy. Long-acting compositions may beadministered every 3 to 4 days, every week, or biweekly depending on thehalf-life and clearance rate of the particular formulation.

Normal dosage amounts may vary from about 0.1 μg to 100,000 μg, up to atotal dose of about 1 gram, depending upon the route of administration.Guidance as to particular dosages and methods of delivery is provided inthe literature and generally available to practitioners in the art Thoseskilled in the art will employ different formulations for nucleotidesthan for proteins or their inhibitors. Similarly, delivery ofpolynucleotides or polypeptides will be specific to particular cells,conditions, locations, etc.

Diagnostics

In another embodiment, antibodies which specifically bind PRTS may beused for the diagnosis of disorders characterized by expression of PRTS,or in assays to monitor patients being treated with PRTS or agonists,antagonists, or inhibitors of PRTS. Antibodies useful for diagnosticpurposes may be prepared in the same manner as described above fortherapeutics. Diagnostic assays for PRTS include methods which utilizethe antibody and a label to detect PRTS inhuman body fluids or inextracts of cells or tissues. The antibodies may be used with or withoutmodification, and may be labeled by covalent or non-covalent attachmentof a reporter molecule. A wide variety of reporter molecules, several ofwhich are described above, are known in the art and may be used.

A variety of protocols for measuring PRTS, including ELISAs, RIAs, andFACS, are known in the art and provide a basis for diagnosing altered orabnormal levels of PRTS expression. Normal or standard values for PRTSexpression are established by combining body fluids or cell extractstaken from normal mammalian subjects, for example, human subjects, withantibodies to PRTS under conditions suitable for complex formation. Theamount of standard complex formation may be quantitated by variousmethods, such as photometric means. Quantities of PRTS expressed insubject, control, and disease samples from biopsied tissues are comparedwith the standard values. Deviation between standard and subject valuesestablishes the parameters for diagnosing disease.

In another embodiment of the invention, the polynucleotides encodingPRTS may be used for diagnostic purposes. The polynucleotides which maybe used include oligonucleotide sequences, complementary RNA and DNAmolecules, and PNAs. The polynucleotides may be used to detect andquantify gene expression in biopsied tissues in which expression of PRTSmay be correlated with disease. The diagnostic assay may be used todetermine absence, presence, and excess expression of PRTS, and tomonitor regulation of PRTS levels during therapeutic intervention.

In one aspect, hybridization with PCR probes which are capable ofdetecting polynucleotide sequences, including genomic sequences,encoding PRTS or closely related molecules may be used to identifynucleic acid sequences which encode PRTS. The specificity of the probe,whether it is made from a highly specific region, e.g., the 5′regulatory region, or from a less specific region, e.g., a conservedmotif, and the stringency of the hybridization or amplification willdetermine whether the probe identifies only naturally occurringsequences encoding PRTS, allelic variants, or related sequences.

Probes may also be used for the detection of related sequences, and mayhave at least 50% sequence identity to any of the PRTS encodingsequences. The hybridization probes of the subject invention may be DNAor RNA and may be derived from the sequence of SEQ ID NO:18-34 or fromgenomic sequences including promoters, enhancers, and introns of thePRTS gene.

Means for producing specific hybridization probes for DNAs encoding PRTSinclude the cloning of polynucleotide sequences encoding PRTS or PRTSderivatives into vectors for the production of mRNA probes. Such vectorsare known in the art, are commercially available, and may be used tosynthesize RNA probes in vitro by means of the addition of theappropriate RNA polymerases and the appropriate labeled nucleotides.Hybridization probes may be labeled by a variety of reporter groups, forexample, by radionuclides such as ³²P or ³⁵S, or by enzymatic labels,such as alkaline phosphatase coupled to the probe via avidin/biotincoupling systems, and the like.

Polynucleotide sequences encoding PRTS may be used for the diagnosis ofdisorders associated with expression of PRTS. Examples of such disordersinclude, but are not limited to, a gastrointestinal disorder, such asdysphagia, peptic esophagitis, esophageal spasm, esophageal stricture,esophageal carcinoma, dyspepsia, indigestion, gastritis, gastriccarcinoma, anorexia, nausea, emesis, gastroparesis, antral or pyloricedema, abdominal angina, pyrosis, gastroenteritis, intestinalobstruction, infections of the intestinal tract, peptic ulcer,cholelithiasis, cholecystitis, cholestasis, pancreatitis, pancreaticcarcinoma, biliary tract disease, hepatitis, hyperbilirubinemia,cirrhosis, passive congestion of the liver, hepatoma, infectiouscolitis, ulcerative colitis, ulcerative proctitis, Crohn's disease,Whipple's disease, Mallory-Weiss syndrome, colonic carcinoma, colonicobstruction, irritable. bowel syndrome, short bowel syndrome, diarrhea,constipation, gastrointestinal hemorrhage, acquired immunodeficiencysyndrome (AIDS) enteropathy, jaundice, hepatic encephalopathy,hepatorenal syndrome, hepatic steatosis, hemochromatosis, Wilson'sdisease, alpha₁-antitrypsin deficiency, Reye's syndrome, primarysclerosing cholangitis, liver infarction, portal vein obstruction andthrombosis, centrilobular necrosis, peliosis hepatis, hepatic veinthrombosis, veno-occlusive disease, preeclampsia, eclampsia, acute fattyliver of pregnancy, intrahepatic cholestasis of pregnancy, and hepatictumors including nodular hyperplasias, adenomas, and carcinomas; acardiovascular disorder, such as arteriovenous fistula, atherosclerosis,hypertension, vasculitis, Raynaud's disease, aneurysms, arterialdissections, varicose veins, thrombophlebitis and phlebothrombosis,vascular tumors, and complications of thrombolysis, balloon angioplasty,vascular replacement, and coronary artery bypass graft surgery,congestive heart failure, ischemic heart disease, angina pectoris,myocardial infarction, hypertensive heart disease, degenerative valvularheart disease, calcific aortic valve stenosis, congenitally bicuspidaortic valve, mitral annular calcification, mitral valve prolapse,rheumatic fever and rheumatic heart disease, infective endocarditis,nonbacterial thrombotic endocarditis, endocarditis of systemic lupuserythematosus, carcinoid heart disease, cardiomyopathy, myocarditis,pericarditis, neoplastic heart disease, congenital heart disease, andcomplications of cardiac transplantation; an autoimmune/inflammatorydisorder, such as acquired immunodeficiency syndrome (AIDS), Addison'sdisease, adult respiratory distress syndrome, allergies, ankylosingspondylitis, amyloidosis, anemia, asthma, atherosclerosis,atherosclerotic plaque rupture, autoimmune hemolytic anenia, autoimmunethyroiditis, autoimmune polyendocrinopathy-candidiasis-ectodermaldystrophy (APECED), bronchitis, cholecystitis, contact dermatitis,Crohn's disease, atopic dermatitis, dermatomyositis, diabetes mellitus,emphysema, episodic lymphopenia with lymphocytotoxins, erythroblastosisfetalis, erythema nodosum, atrophic gastritis, glomerulonephritis,Goodpasture's syndrome, gout, Graves' disease, Hashimoto's thyroiditis,hypereosinophilia, irritable bowel syndrome, multiple sclerosis,myasthenia gravis, myocardial or pericardial inflammation,osteoarthritis, degradation of articular cartilage, osteoporosis,pancreatitis, polymyositis, psoriasis, Reiter's syndrome, rheumatoidarhritis, scleroderma, Sjögren's syndrome, systemic anaphylaxis,systemic lupus erythematosus, systemic sclerosis, dirombocytopenicpurpura, ulcerative colitis, uveitis, Werner syndrome, complications ofcancer, hemodialysis, and extracorporeal circulation, viral, bacterial,fungal, parasitic, protozoal, and helminthic infections, and trauma; acell proliferative disorder such as actinic keratosis, arteriosclerosis,atherosclerosis, bursitis, cirrhosis, hepatitis, mixed connective tissuedisease (MCTD), myelofibrosis, paroxysmal nocturnal hemoglobinuria,polycythemia vera, psoriasis, primary thrombocythemia, and cancersincluding adendcarcinoma, leukemia, lymphoma, melanoma, myeloma,sarcoma, teratocarcinoma, and, in particular, cancers of the adrenalgland, bladder, bone, bone marrow, brain, breast, cervix, gall bladder,ganglia, gastrointestinal tract, heart, kidney, liver, lung, muscle,ovary, pancreas, parathyroid, penis, prostate, salivary glands, skin,spleen, testis, thymus, thyroid, and uterus; a developmental disorder,such as renal tubular acidosis, anemia, Cushing's syndrome,achondroplastic dwarfism, Duchenne and Becker muscular dystrophy, boneresorption, epilepsy, gonadal dysgenesis, WAGR syndrome (Wilms' tumor,aniridia, genitourinary abnormalities, and mental retardation),Smith-Magenis syndrome, myelodysplastic syndrome, hereditarymucoepithelial dysplasia, hereditary keratodermas, hereditaryneuropathies such as Charcot-Marie-Tooth disease and neurofibromatosis,hypothyroidism, hydrocephalus, seizure disorders such as Syndenham'schorea and cerebral palsy, spina bifida, anencephaly,craniorachischisis, congenital glaucoma, cataract, age-related maculardegeneration, and sensorineural hearing loss; an epithelial disorder,such as dyshidrotic eczema, allergic contact dermatitis, keratosispilaris, melasma, vitiligo, actinic keratosis, basal cell carcinoma,squamous cell carcinoma, seborrheic keratosis, folliculitis, herpessimplex, herpes zoster, varicella, candidiasis, dermatophytosis,scabies, insect bites, cherry angioma, keloid, dermatofibroma,acrochordons, urticaria, transient acantholytic dermatosis, xerosis,eczema, atopic dermatitis, contact dermatitis, hand eczema, nummulareczema, lichen simplex chronicus, asteatotic eczema, stasis dermatitisand stasis ulceration, seborrheic dermatitis, psoriasis, lichen planus,pityriasis rosea, impetigo, ecthyma, dermatophytosis, tinea versicolor,warts, acne vulgaris, acne rosacea, pemphigus vulgaris, pemphigusfoliaceus, paraneoplastic pemphigus, bulbous pemphigoid, herpesgestationis, dermatitis herpetiformis, linear IgA disease, epidermolysisbullosa acquisita, dermatomyositis, lupus erythematosus, scleroderma andmorphea, erythroderna, alopecia, figurate skin lesions, telangiectasias,hypopigmentation, hyperpigmentation, vesicles/bullae, exanthems,cutaneous drug reactions, papulonodular skin lesions, chronicnon-healing wounds, photosensitivity diseases, epidermolysis bullosasimplex, epidermolytic hyperkeratosis, epidermolytic andnonepidermolytic palmoplantar keratoderma, ichthyosis bullosa ofSiemens, ichthyosis exfoliativa, keratosis palmaris et plantaris,keratosis palmoplantaris, palmoplantar keratoderma, keratosis punctata,Meesmann's corneal dystrophy, pachyonychia congenita, white spongenevus, steatocystoma multiplex, epidermal nevi/epidermolytichyperkeratosis type, monilethrix, trichothiodystrophy, chronichepatitis/cryptogenic cirrhosis, and colorectal hyperplasia; aneurological disorder, such as epilepsy, ischemic cerebrovasculardisease, stroke, cerebral neoplasms, Alzheimer's disease, Pick'sdisease, Huntington's disease, dementia, Parkinson's disease and otherextrapyramidal disorders, amyotrophic lateral sclerosis and other motorneuron disorders, progressive neural muscular atrophy, retinitispigmentosa, hereditary ataxias, multiple sclerosis and otherdemyelinating diseases, bacterial and viral meningitis, brain abscess,subdural empyema, epidural abscess, suppurative intracranialthrombophlebitis, myelitis and radiculitis, viral central nervous systemdisease, prion diseases including kuru, Creutzfeldt-Jakob disease, andGerstmann-Straussler-Scheinker syndrome, fatal familial insomnia,nutritional and metabolic diseases of the nervous system,neurofibromatosis, tuberous sclerosis, cerebelloretinalhemangioblastomatosis, encephalotrigeminal syndrome, mental retardationand other developmental disorders of the central nervous systemincluding Down syndrome, cerebral palsy, neuroskeletal disorders,autonomic nervous system disorders, cranial nerve disorders, spinal corddiseases, muscular dystrophy and other neuromuscular disorders,peripheral nervous system disorders, dermatomyositis and polymyositis,inherited, metabolic, endocrine, and toxic myopathies, myastheniagravis, periodic paralysis, mental disorders including mood, anxiety,and schizophrenic disorders, seasonal affective disorder (SAD),akathesia, amnesia, catatonia, diabetic neuropathy, tardive dyskinesia,dystonias, paranoid psychoses, postherpetic neuralgia, Tourette'sdisorder, progressive supranuclear palsy, corticobasal degeneration, andfamilial frontotemporal dementia; and a reproductive disorder, such asinfertility, including tubal disease, ovulatory defects, andendometriosis, a disorder of prolactin production, a disruption of theestrous cycle, a disruption of the menstrual cycle, polycystic ovarysyndrome, ovarian hyperstimulation syndrome, an endometrial or ovariantumor, a uterine fibroid, autoimmune disorders, an ectopic pregnancy,and teratogenesis; cancer of the breast, fibrocystic breast disease, andgalactorrhea; a disruption of spermatogenesis, abnormal spermphysiology, cancer of the testis, cancer of the prostate, benignprostatic hyperplasia, prostatitis, Peyronie's disease, impotence,carcinoma of the male breast, and gynecomastia. The polynucleotidesequences encoding PRTS may be used in Southern or northern analysis,dot blot, or other membrane-based technologies; in PCR technologies; indipstick, pin, and multiformat ELISA-like assays; and in microarraysutilizing fluids or tissues from patients to detect altered PRTSexpression. Such qualitative or quantitative methods are well known inthe art.

In a particular aspect, the nucleotide sequences encoding PRTS may beuseful in assays that detect the presence of associated disorders,particularly those mentioned above. The nucleotide sequences encodingPRTS may be labeled by standard methods and added to a fluid or tissuesample from a patient under conditions suitable for the formation ofhybridization complexes. After a suitable incubation period, the sampleis washed and the signal is quantified and compared with a standardvalue. If the amount of signal in the patient sample is signicantlyaltered in comparison to a control sample then the presence of alteredlevels of nucleotide sequences encoding PRTS in the sample indicates thepresence of the associated disorder. Such assays may also be used toevaluate the efficacy of a particular therapeutic treatment regimen inanimal studies, in clinical trials, or to monitor the treatment of anindividual patient.

In order to provide a basis for the diagnosis of a disorder associatedwith expression of PRTS, a normal or standard profile for expression isestablished This may be accomplished by combining body fluids or cellextracts taken from normal subjects, either animal or human, with asequence, or a fragment thereof, encoding PRTS, under conditionssuitable for hybridization or amplification Standard hybridization maybe quantified by comparing the values obtained from normal subjects withvalues from an experiment in which a known amount of a substantiallypurified polynucleotide is used. Standard values obtained in this mannermay be compared with values obtained from samples from patients who aresymptomatic for a disorder. Deviation from standard values is used toestablish the presence of a disorder.

Once the presence of a disorder is established and a treatment protocolis initiated, hybridization assays may be repeated on a regular basis todetermine if the level of expression in the patient begins toapproximate that which is observed in the normal subject The resultsobtained from successive assays may be used to show the efficacy oftreatment over a period ranging from several days to months.

With respect to cancer, the presence of an abnormal amount of transcript(either under- or overexpressed) in biopsied tissue from an individualmay indicate a predisposition for the development of the disease, or mayprovide a means for detecting the disease prior to the appearance ofactual clinical symptoms. A more definitive diagnosis of this type mayallow health professionals to employ preventative measures or aggressivetreatment earlier thereby preventing the development or furtherprogression of the cancer.

Additional diagnostic uses for oligonucleotides designed from thesequences encoding PRTS may involve the use of PCP. These oligomers maybe chemically synthesized, generated enzymatically, or produced invitro. Oligomers will preferably contain a fragment of a polynucleotideencoding PRTS, or a fragment of a polynucleotide complementary to thepolynucleotide encoding PRTS, and will be employed under optimizedconditions for identification of a specific gene or condition. Oligomersmay also be employed under less stringent conditions for detection orquantification of closely related DNA or RNA sequences.

In a particular aspect, oligonucleotide primers derived from thepolynucleotide sequences encoding PRTS may be used to detect singlenucleotide polymorphisms (SNPs). SNPs are substitutions, insertions anddeletions that are a frequent cause of inherited or acquired geneticdisease in humans. Methods of SNP detection include, but are not limitedto, single-stranded conformation polymorphism (SSCP) and fluorescentSSCP (fSSCP) methods. In SSCP, oligonucleotide primers derived from thepolynucleotide sequences encoding PRTS are used to amplify DNA using thepolymerase chain reaction (PCR). The DNA may be derived, for example,from diseased or normal tissue, biopsy samples, bodily fluids, and thelike. SNPs in the DNA cause differences in the secondary and tertiarystructures of PCR products in single-stranded form, and thesedifferences are detectable using gel electrophoresis in non-denaturinggels. In fSCCP, the oligonucleotide primers are fluorescently labeled,which allows detection of the amplimers in high-throughput equipmentsuch as DNA sequencing machines. Additionally, sequence databaseanalysis methods, termed in silico SNP (isSNP), are capable ofidentifying polymorphisms by comparing the sequence of individualoverlapping DNA fragments which assemble into a common consensussequence. These computer-based methods filter out sequence variationsdue to laboratory preparation of DNA and sequencing errors usingstatistical models and automated analyses of DNA sequence chromatograms.In the alternative, SNPs may be detected and characterized by massspectrometry using, for example, the high throughput MASSARRAY system(Sequenom, Inc., San Diego Calif.).

Methods which may also be used to quantify the expression of PRTSinclude radiolabeling or biotinylating nucleotides, coamplification of acontrol nucleic acid, and interpolating results from standard curves.(See, e.g., Melby, P. C. et al. (1993) J. Immunol. Methods 159:235-244;Duplaa, C. et al. (1993) Anal. Biochem. 212:229-236.) The speed ofquantitation of multiple samples may be accelerated by running the assayin a high-throughput format where the oligomer or polynucleotide ofinterest is presented in various dilutions and a spectrophotometric orcolorimetric response gives rapid quantitation.

In further embodiments, oligonucleotides or longer fragments derivedfrom any of the polynucleotide sequences described herein may be used aselements on a microarray. The microarray can be used in transcriptimaging techniques which monitor the relative expression levels of largenumbers of genes simultaneously as described below. The microarray mayalso be used to identify genetic variants, mutations, and polymorphisms.This information may be used to determine gene function, to understandthe genetic basis of a disorder, to diagnose a disorder, to monitorprogression/regression of disease as a function of gene expression, andto develop and monitor the activities of therapeutic agents in thetreatment of disease. In particular, this information may be used todevelop a pharmacogenomic profile of a patient in order to select themost appropriate and effective treatment regimen for that patient. Forexample, therapeutic agents which are highly effective and display thefewest side effects may be selected for a patient based on his/herpharmacogenomic profile.

In another embodiment, PRTS, fragments of PRTS, or antibodies specificfor PRTS may be used as elements on a microarray. The microarray may beused to monitor or measure protein-protein interactions, drug-targetinteractions, and gene expression profiles, as described above.

A particular embodiment relates to the use of the polynucleotides of thepresent invention to generate a transcript image of a tissue or celltype. A transcript image represents the global pattern of geneexpression by a particular tissue or cell type. Global gene expressionpatterns are analyzed by quantifying the number of expressed genes andtheir relative abundance under given conditions and at a given time.(See Seilhamer et al, “Comparative Gene Transcript Analysis,” U.S. Pat.No. 5,840,484, expressly incorporated by reference herein.) Thus atranscript image may be generated by hybridizing the polynucleotides ofthe present invention or their complements to the totality oftranscripts or reverse transcripts of a particular tissue or cell type.In one embodiment, the hybridization takes place in high-throughputformat, wherein the polynucleotides of the present invention or theircomplements comprise a subset of a plurality of elements on amicroarray. The resultant transcript image would provide a profile ofgene activity.

Transcript images may be generated using transcripts isolated fromtissues, cell lines, biopsies, or other biological samples. Thetranscript image may thus reflect gene expression in vivo, as in thecase of a tissue or biopsy sample, or in vitro, as in the case of a cellline.

Transcript images which profile the expression of the polynucleotides ofthe present invention may also be used in conjunction with in vitromodel systems and preclinical evaluation of pharmaceuticals, as well astoxicological testing of industrial and naturally-occurringenvironmental compounds. All compounds induce characteristic geneexpression patterns, frequently termed molecular fingerprints ortoxicant signatures, which are indicative of mechanisms of action andtoxicity (Nuwaysir, E. F. et al. (1999) Mol. Carcinog. 24:153-159;Steiner, S. and N. L. Anderson (2000) Toxicol. Lett. 112-113:467-471,expressly incorporated by reference herein). If a test compound has asignature similar to that of a compound with known toxicity, it islikely to share those toxic properties. These fingerprints or signaturesare most useful and refined when they contain expression informationfrom a large number of genes and gene families. Ideally, a genome-widemeasurement of expression provides the highest quality signature. Evengenes whose expression is not altered by any tested compounds areimportant as well, as the levels of expression of these genes are usedto normalize the rest of the expression data. The normalizationprocedure is useful for comparison of expression data after treatmentwith different compounds. While the assignment of gene function toelements of a toxicant signature aids in interpretation of toxicitymechanisms, knowledge of gene function is not necessary for thestatistical matching of signatures which leads to prediction oftoxicity. (See, for example, Press Release 00-02 from the NationalInstitute of Environmental Health Sciences, released Feb. 29, 2000,available at http://www.niehs.nih.gov/oc/news/toxchip.htm.) Therefore,it is important and desirable in toxicological screening using toxicantsignatures to include all expressed gene sequences.

In one embodiment, the toxicity of a test compound is assessed bytreating a biological sample containing nucleic acids with the testcompound. Nucleic acids that are expressed in the treated biologicalsample are hybridized with one or more probes specific to thepolynucleotides of the present invention, so that transcript levelscorresponding to the polynucleotides of the present invention may bequantified. The transcript levels in the treated biological sample arecompared with levels in an untreated biological sample. Differences inthe transcript levels between the two samples are indicative of a toxicresponse caused by the test compound in the treated sample.

Another particular embodiment relates to the use of the polypeptidesequences of the present invention to analyze the proteome of a tissueor cell type. The term proteome refers to the global pattern of proteinexpression in a particular tissue or cell type. Each protein componentof a proteome can be subjected individually to further analysis.Proteome expression patterns, or profiles, are analyzed by quantifyingthe number of expressed proteins and their relative abundance undergiven conditions and at a given time. A profile of a cell's proteome maythus be generated by separating and analyzing the polypeptides of aparticular tissue or cell type. In one embodiment, the separation isachieved using two dimensional gel electrophoresis, in which proteinsfrom a sample are separated by isoelectric focusing in the firstdimension, and then according to molecular weight by sodium dodecylsulfate slab gel electrophoresis in the second dimension (Steiner andAnderson, supra). The proteins are visualized in the gel as discrete anduniquely positioned spots, typically by staining the gel with an agentsuch as Coomassie Blue or silver or fluorescent stains. The opticaldensity of each protein spot is generally proportional to the level ofthe protein in the sample. The optical densities of equivalentlypositioned protein spots from different samples, for example, frombiological samples either treated or untreated with a test compound ortherapeutic agent, are compared to identify any changes in protein spotdensity related to the treatment. The proteins in the spots arepartially sequenced using, for example, standard methods employingchemical or enzymatic cleavage followed by mass spectrometry. Theidentity of the protein in a spot may be determined by comparing itspartial sequence, preferably of at least 5 contiguous amino acidresidues, to the polypeptide sequences of the present invention. In somecases, further sequence data may be obtained for definitive proteinidentification.

A proteomic profile may also be generated using antibodies specific forPRTS to quantify the levels of PRTS expression. In one embodiment, theantibodies are used as elements on a microarray, and protein expressionlevels are quantified by exposing the microarray to the sample anddetecting the levels of protein bound to each array element (Lueking, A.et al. (1999) Anal. Biochem. 270:103-111; Mendoze, L. G. et al. (1999)Biotechniques 27:778-788). Detection may be performed by a variety ofmethods known in the art, for example, by reacting the proteins in thesample with a thiol- or amino-reactive fluorescent compound anddetecting the amount of fluorescence bound at each array element.

Toxicant signatures at the proteome level are also useful fortoxicological screening, and should be analyzed in parallel withtoxicant signatures at the transcript level. There is a poor correlationbetween transcript and protein abundances for some proteins in sometissues (Anderson, N. L. and J. Seilhamer (1997) Electrophoresis18:533-537), so proteome toxicant signatures may be useful in theanalysis of compounds which do not significantly affect the transcriptimage, but which alter the proteomic profile. In addition, the analysisof tnanscripts in body fluids is difficult, due to rapid degradation ofmRNA, so proteomic profiling may be more reliable and informative insuch cases.

In another embodiment, the toxicity of a test compound is assessed bytreating a biological sample containing proteins with the test compound.Proteins that are expressed in the treated biological sample areseparated so that the amount of each protein can be quantified. Theamount of each protein is compared to the amount of the correspondingprotein in an untreated biological sample. A difference in the amount ofprotein between the two samples is indicative of a toxic response to thetest compound in the treated sample. Individual proteins are identifiedby sequencing the amino acid residues of the individual proteins andcomparing these partial sequences to the polypeptides of the presentinvention.

In another embodiment, the toxicity of a test compound is assessed bytreating a biological sample containing proteins with the test compound.Proteins from the biological sample are incubated with antibodiesspecific to the polypeptides of the present invention. The amount ofprotein recognized by the antibodies is quantified. The amount ofprotein in the treated biological sample is compared with the amount inan untreated biological sample. A difference in the amount of proteinbetween the two samples is indicative of a toxic response to the testcompound in the treated sample.

Microarrays may be prepared, used, and analyzed using methods known inthe art. (See, e.g., Brennan, T. M. et al. (1995) U.S. Pat. No.5,474,796; Schena, M. et al. (1996) Proc. Natl. Acad. Sci. USA93:10614-10619; Baldeschweiler et al. (1995) PCT applicationWO95/251116; Shalon, D. et al. (1995) PCT application WO95/35505;Heller, R. A. et al. (1997) Proc. Natl. Acad. Sci. USA 94:2150-2155; andHeller, M. J. et al. (1997) U.S. Pat. No. 5,605,662.) Various types ofmicroarrays are well known and thoroughly described in DNA Microarrays:A Practical Aproach, M. Schena, ed. (1999) Oxford University Press,London, hereby expressly incorporated by reference.

In another embodiment of the invention, nucleic acid sequences encodingPRTS may be used to generate hybridization probes useful in mapping thenaturally occurring genomic sequence. Either coding or noncodingsequences may be used, and in some instances, noncoding sequences may bepreferable over coding sequences. For example, conservation of a codingsequence among members of a multi-gene family may potentially causeundesired cross hybridization during chromosomal mapping. The sequencesmay be mapped to a particular chromosome, to a specific region of achromosome, or to artificial chromosome constructions, e.g., humanartificial chromosomes (HACs), yeast artificial chromosomes (YACs),bacterial artificial chromosomes (BACs), bacterial P1 constructions, orsingle chromosome cDNA libraries. (See, e.g., Harrington, J. J. et al.(1997) Nat. Genet. 15:345-355; Price, C. M. (1993) Blood Rev. 7:127-134;and Trask, B. J. (1991) Trends Genet. 7:149-154.) Once mapped, thenucleic acid sequences of the invention may be used to develop geneticlinkage maps, for example, which correlate the inheritance of a diseasestate with the inheritance of a particular chromosome region orrestriction fragment length polymorphism (RFLP). (See, for example,Lander, E. S. and D. Botstein (1986) Proc. Natl. Acad. Sci. USA83:7353-7357.)

Fluorescent in situ hybridization (FISH) may be correlated with otherphysical and genetic map data. (See, e.g., Heinz-Ulrich, et al. (1995)in Meyers, supra, pp. 965-968.) Examples of genetic map data can befound in various scientific journals or at the Online MendelianInheritance in Man (OMIM) World Wide Web site. Correlation between thelocation of the gene encoding PRTS on a physical map and a specificdisorder, or a predisposition to a specific disorder, may help definethe region of DNA associated with that disorder and thus may furtherpositional cloning efforts.

In situ hybridization of chromosomal preparations and physical mappingtechniques, such as linkage analysis using established chromosomalmarkers, may be used for extending genetic maps. Often the placement ofa gene on the chromosome of another mammalian species, such as mouse,may reveal associated markers even if the exact chromosomal locus is notknown. This information is valuable to investigators searching fordisease genes using positional cloning or other gene discoverytechniques. Once the gene or genes responsible for a disease or syndromehave been crudely localized by genetic linkage to a particular genomicregion, e.g., ataxia-telangiectasia to 11q22-23, any sequences mappingto that area may represent associated or regulatory genes for furtherinvestigation. (See, e.g., Gatti, R. A. et al. (1988) Nature336:577-580.) The nucleotide sequence of the instant invention may alsobe used to detect differences in the chromosomal location due totranslocation, inversion, etc., among normal, carrier, or affectedindividuals.

In another embodiment of the invention, PRTS, its catalytic orimmunogenic fragments, or oligopeptides thereof can be used forscreening libraries of compounds in any of a variety of drug screeningtechniques. The fragment employed in such screening may be free insolution, affixed to a solid support, borne on a cell surface, orlocated intracellularly. The formation of binding complexes between PRTSand the agent being tested may be measured.

Another technique for drug screening provides for high throughputscreening of compounds having suitable binding affinity to the proteinof interest. (See, e.g., Geysen, et al. (1984) PCT applicationWO84/03564.) In this method, large numbers of different small testcompounds are synthesized on a solid substrate. The test compounds arereacted with PRTS, or fragments thereof, and washed. Bound PRTS is thendetected by methods well known in the art. Purified PRTS can also becoated directly onto plates for use in the aforementioned drug screeningtechniques. Alternatively, non-neutralizing antibodies can be used tocapture the peptide and immobilize it on a solid support.

In another embodiment, one may use competitive drug screening assays inwhich neutralizing antibodies capable of binding PRTS specificallycompete with a test compound for binding PRTS. In this manner,antibodies can be used to detect the presence of any peptide whichshares one or more antigenic determinants with PRTS.

In additional embodiments, the nucleotide sequences which encode PRTSmay be used in any molecular biology techniques that have yet to bedeveloped, provided the new techniques rely on properties of nucleotidesequences that are currently known, including, but not limited to, suchproperties as the triplet genetic code and specific base pairinteractions.

Without further elaboration, it is believed that one skilled in the artcan, using the preceding description, utilize the present invention toits fullest extent. The following embodiments are, therefore, to beconstrued as merely illustrative, and not limitative of the remainder ofthe disclosure in any way whatsoever.

The disclosures of all patents, applications, and publications mentionedabove and below, including U.S. Ser. No. 60/231,039, U.S. Ser. No.60/232,812, U.S. Ser. No. 60/234,850, U.S. Ser. No. 60/236,500, U.S.Ser. No. 60/238,773, and U.S. Ser. No. 60/239,658, are hereby expresslyincorporated by reference.

EXAMPLES

I. Construction of cDNA Libraries

Incyte cDNAs were derived from cDNA libraries described in the LIFESEQGOLD database (Incyte Genomics, Palo Alto Calif.) and shown in Table 4,column 5. Some tissues were homogenized and lysed in guanidiniumisothiocyanate, while others were homogenized and lysed in phenol or ina suitable mixture of denaturants, such as TRIZOL (Life Technologies), amonophasic solution of phenol and guanidine isothiocyanate. Theresulting lysates were centrifuged over CsCl cushions or extracted withchloroform. RNA was precipitated from the lysates with eitherisopropanol or sodium acetate and ethanol, or by other routine methods.

Phenol extraction and precipitation of RNA were repeated as necessary toincrease RNA purity. In some cases, RNA was treated with DNase. For mostlibraries, poly(A)+ RNA was isolated using oligo d(T)-coupledparamagnetic particles (Promega), OLIGOTEX latex particles (QIAGEN,Chatsworth Calif.), or an OLIGOTEX mRNA purification kit (QIAGEN).Alternatively, RNA was isolated directly from tissue lysates using otherRNA isolation kits, e.g., the POLY(A)PURE mRNA purification kit (Ambion,Austin Tex.).

In some cases, Stratagene was provided with RNA and constructed thecorresponding cDNA libraries. Otherwise, cDNA was synthesized and cDNAlibraries were constructed with the UNIZAP vector system (Stratagene) orSUPERSCRIPT plasmid system (Life Technologies), using the recommendedprocedures or similar methods known in the at (See, e.g., Ausubel, 1997,supra, units 5.1-6.6.) Reverse transcription was initiated using oligod(T) or random primers. Synthetic oligonucleotide adapters were ligatedto double stranded cDNA, and the cDNA was digested with the appropriaterestriction enzyme or enzymes. For most libraries, the cDNA wassize-selected (300-1000 bp) using SEPHACRYL S1000, SEPHAROSE CL2B, orSEPHAROSE CL4B column chromatography (Amersham Pharmacia Biotech) orpreparative agarose gel electrophoresis. cDNAs were ligated intocompatible restriction enzyme sites of the polylinker of a suitableplasmid, e.g., PBLUESCRIPT plasmid (Stratagene), PSPORT1 plasmid (LifeTechnologies), PCDNA2.1 plasmid (invitrogen, Carlsbad Calif.), PBK-CMVplasmid (Stratagene), PCR2-TOPOTA (Invitrogen), PCMV-ICIS (Stratagene),or pINCY (Incyte Genomics, Palo Alto Calif.), or derivatives thereof.Recombinant plasmids were transformed into competent E. coli cellsincluding XL1-Blue, XL1-BlueMRF, or SOLR from Stratagene or DH5α, DH10B,or ElectroMAX DH10B from Life Technologies.

II. Isolation of cDNA Clones

Plasmids obtained as described in Example I were recovered from hostcells by in vivo excision using the UNIZAP vector system (Stratagene) orby cell lysis. Plasmids were purified using at least one of thefollowing: a Magic or WIZARD Minipreps DNA purification system(Promega); an AGTC Miniprep purification kit (Edge Biosystems,Gaithersburg Md.); and QIAWEIL 8 Plasmid, QIAWELL 8 Plus Plasmid,QIAWELL 8 Ultra Plasmid purification systems or the R.E.A.L. PREP 96plasmid purification kit from QIAGEN. Following precipitation, plasmidswere resuspended in 0.1 ml of distilled water and stored, with orwithout lyophilization, at 4° C.

Alternatively, plasmid DNA was amplified from host cell lysates usingdirect link PCR in a high-throughput format (Rao, V. B. (1994) Anal.Biochem. 216:1-14). Host cell lysis and thermal cycling steps werecarried out in a single reaction mixture. Samples were processed andstored in 384-well plates, and the concentration of amplified plasmidDNA was quantified fluorometrically using PICOGREEN dye (MolecularProbes, Eugene Oreg.) and a FLUOROSKAN II fluorescence scanner(Labsystems Oy, Helsinki, Finland).

III. Sequencing and Analysis

Incyte cDNA recovered in plasmids as described in Example II weresequenced as follows. Sequencing reactions were processed using standardmethods or high-throughput instrumentation such as the ABI CATALYST 800(Applied Biosystems) thermal cycler or the PTC-200 thermal cycler (MJResearch) in conjunction with the HYDRA microdispenser (RobbinsScientific) or the MICROLAB 2200 Hamilton) liquid transfer system. cDNAsequencing reactions were prepared using reagents provided by AmershamPharmacia Biotech or supplied in ABI sequencing kits such as the ABIPRISM BIGDYE Terminator cycle sequencing ready reaction kit (AppliedBiosystems). Electrophoretic separation of cDNA sequencing reactions anddetection of labeled polynucleotides were carried out using the MEGABACE1000 DNA sequencing system (Molecular Dynamics); the ABI PRISM 373 or377 sequencing system (Applied Biosystems) in conjunction with standardABI protocols and base calling software; or other sequence analysissystems known in the art. Reading frames within the cDNA sequences wereidentified using standard methods (reviewed in Ausubel, 1997, supra,unit 7.7). Some of the cDNA sequences were selected for extension usingthe techniques disclosed in Example VIII.

The polynucleotide sequences derived from Incyte cDNAs were validated byremoving vector, linker, and poly(A) sequences and by masking ambiguousbases, using algorithms and programs based on BLAST, dynamicprogramming, and dinucleotide nearest neighbor analysis. The Incyte cDNAsequences or translations thereof were then queried against a selectionof public databases such as the GenBank primate, rodent, mammalian,vertebrate, and eukaryote databases, and BLOCKS, PRINTS, DOMO, PRODOM,and hidden Markov model (HMM)-based protein family databases such asPFAM. (HMM is a probabilistic approach which analyzes consensus primarystructures of gene families. See, for example, Eddy, S. R. (1996) Curr.Opin. Struct. Biol. 6:361-365.) The queries were performed usingprograms based on BLAST, FASTA, BLIMPS, and HMMER. The Incyte cDNAsequences were assembled to produce full length polynucleotidesequences. Alternatively, GenBank cDNAs, GenBank ESTs, stitchedsequences, stretched sequences, or Genscan-predicted coding sequences(see Examples IV and V) were used to extend Incyte cDNA assemblages tofull length. Assembly was performed using programs based on Phred,Phrap, and Consed, and cDNA assemblages were screened for open readingframes using programs based on GeneMark, BLAST, and FASTA. The fulllength polynucleotide sequences were translated to derive thecorresponding fall length polypeptide sequences. Alternatively, apolypeptide of the invention may begin at any of the methionine residuesof the full length translated polypeptide. Full length polypeptidesequences were subsequently analyzed by querying against databases suchas the GenBank protein databases (genpept), SwissProt, BLOCKS, PRINTS,DOMO, PRODOM, Prosite, and hidden Markov model (HMM)-based proteinfamily databases such as PFAM. Full length polynucleotide sequences arealso analyzed using MACDNASIS PRO software (Hitachi SoftwareEngineering, South San Francisco Calif.) and LASERGENE software(DNASTAR). Polynucleotide and polypeptide sequence alignments aregenerated using default parameters specified by the CLUSTAL algorithm asincorporated into the MEGALIGN multisequence alignment program(DNASTAR), which also calculates the percent identity between alignedsequences.

Table 7 summarizes the tools, programs, and algorithms used for theanalysis and assembly of Incyte cDNA and full length sequences andprovides applicable descriptions, references, and threshold parameters.The first column of Table 7 shows the tools, programs, and algorithmsused, the second column provides brief descriptions thereof, the thirdcolumn presents appropriate references, all of which are incorporated byreference herein in their entirety, and the fourth column presents,where applicable, the scores, probability values, and other parametersused to evaluate the strength of a match between two sequences (thehigher the score or the lower the probability value, the greater theidentity between two sequences).

The programs described above for the assembly and analysis of fulllength polynucleotide and polypeptide sequences were also used toidentify polynucleotide sequence fragments from SEQ ID NO:18-34.Fragments from about 20 to about 4000 nucleotides which are useful inhybridization and amplification technologies are described in Table 4,column 4.

IV. Identification and Editing of Coding Sequences from Genomic DNA

Putative proteases were initially identified by running the Genscan geneidentification program against public genomic sequence databases (e.g.,gbpri and gbhtg). Genscan is a general-purpose gene identificationprogram which analyzes genomic DNA sequences from a variety of organisms(See Burge, C. and S. Karlin (1997) J. Mol. Biol. 268:78-94, and Burge,C. and S. Karlin (1998) Curr. Opin. Struct. Biol. 8:346-354). Theprogram concatenates predicted exons to form an assembled cDNA sequenceextending from a methionine to a stop codon The output of Genscan is aFASTA database of polynucleotide and polypeptide sequences. The maximumrange of sequence for Genscan to analyze at once was set to 30 kb. Todetermine which of these Genscan predicted cDNA sequences encodeproteases, the encoded polypeptides were analyzed by querying againstPFAM models for proteases. Potential proteases were also identified byhomology to Incyte cDNA sequences that had been annotated as proteases.These selected Genscan-predicted sequences were then compared by BLASTanalysis to the genpept and gbpri public databases. Where necessary, theGenscan-predicted sequences were then edited by comparison to the topBLAST hit from genpept to correct errors in the sequence predicted byGenscan, such as extra or omitted exons. BLAST analysis was also used tofind any Incyte cDNA or public cDNA coverage of the Genscan-predictedsequences, thus providing evidence for transcription. When Incyte cDNAcoverage was available, this information was used to correct or confirmthe Genscan predicted sequence. Full length polynucleotide sequenceswere obtained by assembling Genscan-predicted coding sequences withIncyte cDNA sequences and/or public cDNA sequences using the assemblyprocess described in Example III. Alternatively, full lengthpolynucleotide sequences were derived entirely from edited or uneditedGenscan-predicted coding sequences.

V. Assembly of Genomic Sequence Data with CDNA Sequence Data

“Stitched” Sequences

Partial cDNA sequences were extended with exons predicted by the Genscangene identification program described in Example IV. Partial cDNAsassembled as described in Example III were mapped to genomic DNA andparsed into clusters containing related cDNAs and Genscan exonpredictions from one or more genomic sequences. Each cluster wasanalyzed using an algorithm based on graph theory and dynamicprogramming to integrate cDNA and genomic information, generatingpossible splice variants that were subsequently confirmed, edited, orextended to create a full length sequence. Sequence intervals in whichthe entire length of the interval was present on more than one sequencein the cluster were identified, and intervals thus identified wereconsidered to be equivalent by transitivity. For example, if an intervalwas present on a cDNA and two genomic sequences, then al three intervalswere considered to be equivalent This process allows unrelated butconsecutive genomic sequences to be brought together, bridged by cDNAsequence. Intervals thus identified were then “stitched” together by thestitching algorithm in the order that they appear along their parentsequences to generate the longest possible sequence, as well as sequencevariants. Linkages between intervals which proceed along one type ofparent sequence (cDNA to cDNA or genomic sequence to genomic sequence)were given preference over linkages which change parent type (cDNA togenomic sequence). The resultant stitched sequences were translated andcompared by BLAST analysis to the genpept and gbpri public databases.Incorrect exons predicted by Genscan were corrected by comparison to thetop BLAST hit from genpept Sequences were further extended withadditional cDNA sequences, or by inspection of genomic DNA, whennecessary.

“Stretched” Sequences

Partial DNA sequences were extended to full length with an algorithmbased on BLAST analysis. First, partial cDNAs assembled as described inExample III were queried against public databases such as the GenBankprimate, rodent, mammalian, vertebrate, and eukaryote databases usingthe BLAST program. The nearest GenBank protein homolog was then comparedby BLAST analysis to either Incyte cDNA sequences or GenScan exonpredicted sequences described in Example IV. A chimeric protein wasgenerated by using the resultant high-scoring segment pairs (HSPs) tomap the translated sequences onto the GenBank protein homolog.Insertions or deletions may occur in the chimeric protein with respectto the original GenBank protein homolog. The GenBank protein homolog,the chimeric protein, or both were used as probes to search forhomologous genomic sequences from the public human genome databases.Partial DNA sequences were therefore “stretched” or extended by theaddition of homologous genomic sequences. The resultant stretchedsequences were examined to determine whether it contained a completegene.

VI. Chromosomal Mapping of PRTS Encoding Polynucleotides

The sequences which were used to assemble SEQ ID NO:18-34 were comparedwith sequences from the Incyte LIFESEQ database and public domaindatabases using BLAST and other implementations of the Smith-Watermanalgorithm Sequences from these databases that matched SEQ ID NO:18-34were assembled into clusters of contiguous and overlapping sequencesusing assembly algorithms such as Phrap (Table 7). Radiation hybrid andgenetic mapping data available from public resources such as theStanford Human Genome Center (SHGC), Whitehead Institute for GenomeResearch (WIGR), and Généthon were used to determine if any of theclustered sequences had been previously mapped. Inclusion of a mappedsequence in a cluster resulted in the assignment of all sequences ofthat cluster, including its particular SEQ ID NO:, to that map location.

Map locations are represented by ranges, or intervals, of humanchromosomes. The map position of an interval, in centiMorgans, ismeasured relative to the terminus of the chromosome's p-arm. (ThecentiMorgan (cM) is a unit of measurement based on recombinationfrequencies between chromosomal markers. On average, 1 cM is roughlyequivalent to 1 megabase (Mb) of DNA in humans, although this can varywidely due to hot and cold spots of recombination.) The cM distances arebased on genetic markers mapped by Généthon which provide boundaries forradiation hybrid markers whose sequences were included in each of theclusters. Human genome maps and other resources available to the public,such as the NCBI “GeneMap'99” World Wide Web site(http://www.ncbi.nlm.nih.gov/genemap/), can be employed to determine ifpreviously identified disease genes map within or in proximity to theintervals indicated above.

In this manner, SEQ ID NO:18 was mapped to chromosome 16 within theinterval from 33.4 to 42.7 centiMorgans. In this manner, SEQ ID NO:22was mapped to chromosome 1 within the interval from 219.2 to 222.7centiMorgans.

VII. Analysis of Polynucleotide Expression

Northern analysis is a laboratory technique used to detect the presenceof a transcript of a gene and involves the hybridization of a labelednucleotide sequence to a membrane on which RNAs from a particular celltype or tissue have been bound (See, e.g., Sambrook, supra, ch. 7;Ausubel (1995) supra, ch. 4 and 16.)

Analogous computer techniques applying BLAST were used to search foridentical or related molecules in cDNA databases such as GenBank orLIFESEQ (Incyte Genomics). This analysis is much faster than multiplemembrane-based hybridizations. In addition, the sensitivity of thecomputer search can be modified to determine whether any particularmatch is categorized as exact or similar. The basis of the search is theproduct score, which is defined as:$\frac{{BLAST}\quad{Score} \times {Percent}\quad{Identity}}{5 \times {minimum}\quad\{ {{{length}{\quad\quad}( {{Seq}.\quad 1} )},{{length}{\quad\quad}( {{Seq}.\quad 2} )}} \}}$The product score takes into account both the degree of similaritybetween two sequences and the length of the sequence match The productscore is a normalized value between 0 and 100, and is calculated asfollows: the BLAST score is multiplied by the percent nucleotideidentity and the product is divided by (5 times the length of theshorter of the two sequences). The BLAST score is calculated byassigning a score of +5 for every base that matches in a high-scoringsegment pair (HSP), and −4 for every mismatch. Two sequences may sharemore than one HSP (separated by gaps). If ere is more than one HSP, thenthe pair with the highest BLAST score is used to calculate the productscore. The product score represents a balance between fractional overlapand quality in a BLAST alignment. For example, a product score of 100 isproduced only for 100% identity over the entire length of the shorter ofthe two sequences being compared. A product score of 70 is producedeither by 100% identity and 70% overlap at one end, or by 88% identityand 100% overlap at the other. A product score of 50 is produced eitherby 100% identity and 50% overlap at one end, or 79% identity and 100%overlap.

Alternatively, polynucleotide sequences encoding PRTS are analyzed withrespect to the tissue sources from which they were derived. For example,some full length sequences are assembled, at least in part, withoverlapping Incyte cDNA sequences (see Example III). Each cDNA sequenceis derived from a cDNA library constructed from a human tissue. Eachhuman tissue is classified into one of the following organ/tissuecategories: cardiovascular system; connective tissue; digestive system;embryonic structures; endocrine system; exocrine glands; genitalia,female; genitalia, male; germ cells; hemic and immune system; liver;musculoskeletal system; nervous system; pancreas; respiratory system;sense organs; skin; stomatognathic system; unclassified/mixed; orurinary tract. The number of libraries in each category is counted anddivided by the total number of libraries across all categories.Similarly, each human tissue is classified into one of the followingdisease/condition categories: cancer, cell line, developmental,inflammation, neurological, trauma, cardiovascular, pooled, and other,and the number of libraries in each category is counted and divided bythe total number of libraries across all categories. The resultingpercentages reflect the tissue- and disease-specific expression of cDNAencoding PRTS. cDNA sequences and cDNA library/tissue information arefound in the LIFESEQ GOLD database (Incyte Genomics, Palo Alto Calif.).

VIII. Extension of PRTS Encoding Polynucleotides

Full length polynucleotide sequences were also produced by extension ofan appropriate fragment of the full length molecule usingoligonucleotide primers designed from this fragment. One primer wassynthesized to initiate 5′ extension of the known fragment, and theother primer was synthesized to initiate 3′ extension of the knownfragment. The initial primers were designed using OLIGO 4.06 software(National Biosciences), or another appropriate program, to be about 22to 30 nucleotides in length, to have a GC content of about 50% or more,and to anneal to the target sequence at temperatures of about 68° C. toabout 72° C. Any stretch of nucleotides which would result in hairpinstructures and primer-primer dimerizations was avoided.

Selected human cDNA libraries were used to extend the sequence. If morethan one extension was necessary or desired, additional or nested setsof primers were designed.

High fidelity amplification was obtained by PCR using methods well knownin the art. PCR was performed in 96-well plates using the PTC-200thermal cycler (MJ Research, Inc.). The reaction mix contained DNAtemplate, 200 nmol of each primer, reaction buffer containing Mg²⁺,(NH₄)₂SO₄, and 2-mercaptoethanol, Taq DNA polymerase (Amersham PharmaciaBiotech), ELONGASE enzyme (Life Technologies), and Pfu DNA polymerase(Stratagene), with the following parameters for primer pair PCI A andPCI B: Step 1: 94° C., 3 min; Step 2: 94° C., 15 sec; Step 3: 60° C., 1min; Step 4: 68° C., 2 min; Step 5: Steps 2, 3, and 4 repeated 20 times;Step 6: 68° C., 5 min; Step 7: storage at 4° C. In the alternative, theparameters for primer pair T7 and SK+ were as follows: Step 1: 94° C., 3min; Step 2: 94° C., 15 sec; Step 3: 57° C., 1 min; Step 4: 68° C., 2min; Step 5: Steps 2, 3, and 4 repeated 20 times; Step 6: 68° C., 5 min;Step 7: storage at 4° C.

The concentration of DNA in each well was determined by dispensing 100μl PICOGREEN quantitation reagent (0.25% (v/v) PICOGREEN; MolecularProbes, Eugene Oreg.) dissolved in 1×TE and 0.5 μl of undiluted PCRproduct into each well of an opaque fluorimeter plate (Corning Costar,Acton Mass.), allowing the DNA to bind to the reagent. The plate wasscanned in a Fluoroskan II (Labsystems Oy, Helsinki, Finland) to measurethe fluorescence of the sample and to quantify the concentration of DNA.A 5 μl to 10 μl aliquot of the reaction mixture was analyzed byelectrophoresis on a 1% agarose gel to determine which reactions weresuccessful in extending the sequence.

The extended nucleotides were desalted and concentrated, transferred to384-well plates, digested with CviJI cholera virus endonuclease(Molecular Biology Research, Madison Wis.), and sonicated or shearedprior to religation into pUC 18 vector (Amersham Pharmacia Biotech). Forshotgun sequencing, the digested nucleotides were separated on lowconcentration (0.6 to 0.8%) agarose gels, fragments were excised, andagar digested with Agar ACE (Promega). Extended clones were religatedusing T4 ligase (New England Biolabs, Beverly Mass.) into pUC 18 vector(Amersham Pharmacia Biotech), treated with Pfu DNA polymerase(Stratagene) to fill-in restriction site overhangs, and transfected intocompetent E. coli cells. Transformed cells were selected onantbiotic-containing media, and individual colonies were picked andcultured overnight at 37° C. in 384-well plates in LB/2× carb liquidmedia.

The cells were lysed, and DNA was amplified by PCR using Taq DNApolymerase (Amersham Pharmacia Biotech) and Pfu DNA polymerase(Stratagene) with the following parameters: Step 1: 94° C., 3 min; Step2: 94° C., 15 sec; Step 3: 60° C., 1 min; Step 4: 72° C., 2 min; Step 5:steps 2, 3, and 4 repeated 29 times; Step 6: 72° C., 5 min; Step 7:storage at 4° C. DNA was quantified by PICOGREEN reagent (MolecularProbes) as described above. Samples with low DNA recoveries werereamplified using the same conditions as described above. Samples werediluted with 20% dimethysulfoxide (1:2, v/v), and sequenced usingDYENAMIC energy transfer sequencing primers and the DYENAMIC DIRECT kit(Amersham Pharmacia Biotech) or the ABI PRISM BIGDYE Terminator cyclesequencing ready reaction kit (Applied Biosystems).

In like manner, full length polynucleotide sequences are verified usingthe above procedure or are used to obtain 5′ regulatory sequences usingthe above procedure along with oligonucleotides designed for suchextension, and an appropriate genomic library.

IX. Labeling and Use of Individual Hybridization Probes

Hybridization probes derived from SEQ ID NO:18-34 are employed to screencDNAs, genomic DNAs, or mRNAs. Although the labeling ofoligonucleotides, consisting of about 20 base pairs, is specificallydescribed, essentially the same procedure is used with larger nucleotidefragments. Oligonucleotides are designed using state-of-the-art softwaresuch as OLIGO 4.06 software (National Biosciences) and labeled bycombining 50 pmol of each oligomer, 250 μCi of [γ-³²P] adenosinetriphosphate (Amersham Pharmacia Biotech), and T4 polynucleotide kinase(DuPont NEN, Boston Mass.). The labeled oligonucleotides aresubstantially purified using a SEPHADEX G-25 superfine size exclusiondextran bead column (Amersham Pharmacia Biotech). An aliquot containing10⁷ counts per minute of the labeled probe is used in a typicalmembrane-based hybridization analysis of human genomic DNA digested withone of the following endonucleases: Ase I, Bgl II, Eco RI, Pst I, Xba I,or Pvu II (DuPont NEN).

The DNA from each digest is fractionated on a 0.7% agarose gel andtransferred to nylon membranes (Nytran Plus, Schleicher & Schuell DurhamN.H). Hybridization is carried out for 16 hours at 40° C. To removenonspecific signals, blots are sequentially washed at room temperatureunder conditions of up to, for example, 0.1× saline sodium citrate and0.5% sodium dodecyl sulfate. Hybridization patterns are visualized usingautoradiography or an alternative imaging means and compared.

X. Microarrays

The linkage or synthesis of array elements upon a microarray can beachieved utilizing photolithography, piezoelectric printing (ink-jetprinting, See, e.g., Baldeschweiler, supra), mechanical microspottingtechnologies, and derivatives thereof. The substrate in each of theaforementioned technologies should be uniform and solid with anon-porous surface (Schena (1999), supra). Suggested substrates includesilicon, silica, glass slides, glass chips, and silicon wafers.Alternatively, a procedure analogous to a dot or slot blot may also beused to arrange and link elements to the surface of a substrate usingthermal, UV, chemical, or mechanical bonding procedures. A typical arraymay be produced using available methods and machines well known to thoseof ordinary skill in the art and may contain any appropriate number ofelements. (See, e.g., Schena. M. et al. (1995) Science 270:467-470;Shalon, D. et al. (1996) Genome Res. 6:639-645; Marshall, A. and J.Hodgson (1998) Nat. Biotechnol. 16:27-31.)

Full length cDNAs, Expressed Sequence Tags (ESTs), or fragments oroligomers thereof may comprise the elements of the microarray. Fragmentsor oligomers suitable for hybridization can be selected using softwarewell known in the art such as LASERGENE software (DNASTAR). The arrayelements are hybridized with polynucleotides in a biological sample. Thepolynucleotides in the biological sample are conjugated to a fluorescentlabel or other molecular tag for ease of detection. After hybridization,nonhybridized nucleotides from the biological sample are removed, and afluorescence scanner is used to detect hybridization at each arrayelement. Alternatively, laser desorption and mass spectrometry may beused for detection of hybridization. The degree of complementarity andthe relative abundance of each polynucleotide which hybridizes to anelement on the microarray may be assessed. In one embodiment, microarraypreparation and usage is described in detail below.

Tissue or Cell Sample Preparation

Total RNA is isolated from tissue samples using the guanidiniumthiocyanate method and poly(A)⁺ RNA is purified using the oligo-(dT)cellulose method. Each poly(A)⁺ RNA sample is reverse transcribed usingMMLV reverse-transcriptase, 0.05 pg/μl oligo-(dT) primer (21mer), 1×first strand buffer, 0.03 units/μl RNase inhibitor, 500 μM dATP, 500 μMdGTP, 500 μM dTTP, 40 μM dCTP, 40 μM dCTP-Cy3 (BDS) or dCTP-Cy5(Amersham Pharmacia Biotech). The reverse transcription reaction isperformed in a 25 ml volume containing 200 ng poly(A)⁺ RNA withGEMBRIGHT kits (Incyte). Specific control poly(A)⁺ RNAs are synthesizedby in vitro transcription from non-coding yeast genomic DNA. Afterincubation at 37° C. for 2 hr, each reaction sample (one with Cy3 andanother with Cy5 labeling) is treated with 2.5 ml of 0.5M sodiumhydroxide and incubated for 20 minutes at 85° C. to the stop thereaction and degrade the RNA. Samples are purified using two successiveCHROMA SPIN 30 gel filtration spin columns (CLONTECH Laboratories, Inc.(CLONTECH), Palo Alto Calif.) and after combining, both reaction samplesare ethanol precipitated using 1 ml of glycogen (1 mg/ml), 60 ml sodiumacetate, and 300 ml of 100% ethanol. The sample is then dried tocompletion using a SpeedVAC (Savant Instruments Inc., Holbrook N.Y.) andresuspended in 14 μl 5×SSC/0.2% SDS.

Microarray Preparation

Sequences of the present invention are used to generate array elements.Each array element is amplified from bacterial cells containing vectorswith cloned cDNA inserts. PCR amplification uses primers complementaryto the vector sequences flaking the cDNA insert. Array elements areamplified in thirty cycles of PCR from an initial quantity of 1-2 ng toa final quantity greater than 5 μg. Amplified array elements are thenpurified using SEPHACRYL-400 (Amersham Pharmacia Biotech).

Purified array elements are inmmobilized on polymer-coated glass slides.Glass microscope slides (Corning) are cleaned by ultrasound in 0.1% SDSand acetone, with extensive distilled water washes between and aftertreatments. Glass slides are etched in 4% hydrofluoric acid (VWRScientific Products Corporation (VWR), West Chester Pa.), washedextensively in distilled water, and coated with 0.05% aminopropyl silane(Sigma) in 95% ethanol. Coated slides are cured in a 110° C. oven.

Array elements are applied to the coated glass substrate using aprocedure described in U.S. Pat. No. 5,807,522, incorporated herein byreference. 1 μl of the array element DNA, at an average concentration of100 ng/μl, is loaded into the open capillary printing element by ahigh-speed robotic apparatus. The apparatus then deposits about 5 nl ofarray element sample per slide.

Microarrays are UV-crosslinked using a STRATALINKER UV-crosslinker(Stratagene). Microarrays are washed at room temperature once in 0.2%SDS and three times in distilled water. Non-specific binding sites areblocked by incubation of microarrays in 0.2% casein in phosphatebuffered saline (PBS) (Tropix, Inc., Bedford Mass.) for 30 minutes at60° C. followed by washes in 0.2% SDS and distilled water as before.

Hybridization

Hybridization reactions contain 9 μl of sample mixture consisting of 0.2μg each of Cy3 and Cy5 labeled cDNA synthesis products in 5×SSC, 0.2%SDS hybridization buffer. The sample mixture is heated to 65° C. for 5minutes and is aliquoted onto the microarray surface and covered with an1.8 cm² coverslip. The arrays are transferred to a waterproof chamberhaving a cavity just slightly larger than a microscope slide. Thechamber is kept at 100% humidity internally by the addition of 140 μl of5×SSC in a corner of the chamber. The chamber containing the arrays isincubated for about 6.5 hours at 60° C. The arrays are washed for 10 minat 45° C. in a first wash buffer (0.1×SSC, 0.1% SDS), three times for 10minutes each at 45° C. in a second wash buffer (0.1×SSC), and dried.

Detection

Reporter-labeled hybridization complexes are detected with a microscopeequipped with an Innova 70 mixed gas 10 W laser (Coherent, Inc., SantaClara Calif.) capable of generating spectral lines at 488 nm forexcitation of Cy3 and at 632 nm for excitation of Cy5. The excitationlaser light is focused on the array using a 20× microscope objective(Nikon, Inc., Melville N.Y.). The slide containing the array is placedon a computer-controlled X-Y stage on the microscope and raster-scannedpast the objective. The 1.8 cm×1.8 cm array used in the present exampleis scanned with a resolution of 20 micrometers.

In two separate scans, a mixed gas multiline laser excites the twofluorophores sequentially. Emitted light is split, based on wavelength,into two photomultiplier tube detectors (PMT R1477, Hamamatsu PhotonicsSystems, Bridgewater N.J.) corresponding to the two fluorophores.Appropriate filters positioned between the array and the photomultipliertubes are used to filter the signals. The emission maxima of thefluorophores used are 565 nm for Cy3 and 650 nm for Cy5. Each array istypically scanned twice, one scan per fluorophore using the appropriatefilters at the laser source, although the apparatus is capable ofrecording the spectra from both fluorophores simultaneously:

The sensitivity of the scans is typically calibrated using the signalintensity generated by a cDNA control species added to the samplemixture at a known concentration. A specific location on the arraycontains a complementary DNA sequence, allowing the intensity of thesignal at that location to be correlated with a weight ratio ofhybridizing species of 1:100,000. When two samples from differentsources (e.g., representing test and control cells), each labeled with adifferent fluorophore, are hybridized to a single array for the purposeof identifying genes that are differentially expressed, the calibrationis done by labeling samples of the calibrating cDNA with the twofluorophores and adding identical amounts of each to the hybridizationmixture.

The output of the photomultiplier tube is digitized using a 12-bitRTI-835H analog-to-digital (A/D) conversion board (Analog Devices, Inc.,Norwood Mass.) installed in an IBM-compatible PC computer. The digitizeddata are displayed as an image where the signal intensity is mappedusing a linear 20-color transformation to a pseudocolor scale rangingfrom blue (low signal) to red (high signal). The data is also analyzedquantitatively. Where two different fluorophores are excited andmeasured simultaneously, the data are first corrected for opticalcrosstalk (due to overlapping emission spectra) between the fluorophoresusing each fluorophore's emission spectrum.

A grid is superimposed over the fluorescence signal image such that thesignal from each spot is centered in each element of the grid. Thefluorescence signal within each element is then integrated to obtain anumerical value corresponding to the average intensity of the signal.The software used for signal analysis is the GEMTOOLS gene expressionanalysis program (Incyte).

XI. Complementary Polynucleotides

Sequences complementary to the PRTS-encoding sequences, or any partsthereof, are used to detect, decrease, or inhibit expression ofnaturally occurring PRTS. Although use of oligonucleotides comprisingfrom about 15 to 30 base pairs is described, essentially the sameprocedure is used with smaller or with larger sequence fragments.Appropriate oligonucleotides are designed using OLIGO 4.06 software(National Biosciences) and the coding sequence of PRTS. To inhibittranscription, a complementary oligonucleotide is designed from the mostunique 5′ sequence and used to prevent promoter binding to the codingsequence. To inhibit translation, a complementary oligonucleotide isdesigned to prevent ribosomal binding to the PRTS-encoding transcript.

XII. Expression of PRTS

Expression and purification of PRTS is achieved using bacterial orvirus-based expression systems. For expression of PRTS in bacteria, cDNAis subcloned into an appropriate vector containing an antibioticresistance gene and an inducible promoter that directs high levels ofcDNA transcription. Examples of such promoters include, but are notlimited to, the trp-lac (tac) hybrid promoter and the T5 or T7bacteriophage promoter in conjunction with the lac operator regulatoryelement Recombinant vectors are transformed into suitable bacterialhosts, e.g., BL21(DE3). Antibiotic resistant bacteria express PRTS uponinduction with isopropyl beta-D-thiogalactopyranoside (IPTG). Expressionof PRTS in eukaryotic cells is achieved by infecting insect or mammaliancell lines with recombinant Autographica californica nuclearpolyhedrosis virus (AcMNPV), commonly known as baculovirus. Thenonessential polyhedrin gene of baculovirus is replaced with cDNAencoding PRTS by either homologous recombination or bacterial-mediatedtransposition involving transfer plasmid intermediates. Viralinfectivity is maintained and the strong polyhedrin promoter drives highlevels of cDNA transcription. Recombinant baculovirus is used to infectSpodoptera frugiperda (Sf9) insect cells in most cases, or humanhepatocytes, in some cases. Infection of the latter requires additionalgenetic modifications to baculovirus. (See Engelhard, E. K et al. (1994)Proc. Natl. Acad. Sci. USA 91:3224-3227; Sandig, V. et al. (1996) Hum.Gene Ther. 7:1937-1945.)

In most expression systems, PRTS is synthesized as a fusion proteinwith, e.g., glutathione S-transferase (GST) or a peptide epitope tag,such as FLAG or 6-His, permitting rapid, single-step, affinity-basedpurification of recombinant fusion protein from crude cell lysates. GST,a 26-kilodalton enzyme from Schistosoma japonicum, enables thepurification of fusion proteins on immobilized glutathione underconditions that maintain protein activity and antigenicity (AmershamPharmacia Biotech). Following purification, the GST moiety can beproteolytically cleaved from PRTS at specifically engineered sites.FLAG, an 8-amino acid peptide, enables immunoaffinity purification usingcommercially available monoclonal and polyclonal anti-FLAG antibodies(Eastman Kodak). 6-His, a stretch of six consecutive histidine residues,enables purification on metal-chelate resins (QIAGEN). Methods forprotein expression and purification are discussed in Ausubel (1995,supra, ch. 10 and 16). Purified PRTS obtained by these methods can beused directly in the assays shown in Examples XVI, XVII, XVIII, and XIXwhere applicable.

XIII. Functional Assays

PRTS function is assessed by expressing the sequences encoding PRTS atphysiologically elevated levels in mammalian cell culture systems. cDNAis subcloned into a mammalian expression vector containing a strongpromoter that drives high levels of cDNA expression. Vectors of choiceinclude PCMV SPORT (Life Technologies) and PCR3.1 (Invitrogen, CarlsbadCalif.), both of which contain the cyt megalovirus promoter. 5-10 μg ofrecombinant vector are transiently trasfected into a human cell line,for example, an endothelial or hematopoietic cell line, using eitherliposome formulations or electroporation. 1-2 μg of an additionalplasmid containing sequences encoding a marker protein areco-transfected. Expression of a marker protein provides a means todistinguish transfected cells from nontransfected cells and is areliable predictor of cDNA expression from the recombinant vector.Marker proteins of choice include, e.g., Green Fluorescent Protein (GFP;Clontech), CD64, or a CD64-GFP fusion protein. Flow cytometry (FCM), anautomated, laser optics-based technique, is used to identify transfectedcells expressing GFP or CD64-GFP and to evaluate the apoptotic state ofthe cells and other cellular properties. FCM detects and quantifies theuptake of fluorescent molecules that diagnose events preceding orcoincident with cell death. These events include changes in nuclear DNAcontent as measured by staining of DNA with propidium iodide; changes incell size and granularity as measured by forward light scatter and 90degree side light scatter; down-regulation of DNA synthesis as measuredby decrease in bromodeoxynridine uptake; alterations in expression ofcell surface and intracellular proteins as measured by reactivity withspecific antibodies; and alterations in plasma membrane composition asmeasured by the binding of fluorescein-conjugated Annexin V protein tothe cell surface. Methods in flow cytometry are discussed in Ormerod, M.G. (1994) Flow Cytometry, Oxford, New York N.Y.

The influence of PRTS on gene expression can be assessed using highlypurified populations of cells transfected with sequences encoding PRTSand either CD64 or CD64-GFP. CD64 and CD64-GEP are expressed on thesurface of transfected cells and bind to conserved regions of humanimmunoglobulin G (IgG). Transfected cells are efficiently-separated fromnontransfected cells using magnetic beads coated with either human IgGor antibody against CD64 (DYNAL, Lake Success N.Y.). mRNA can bepurified from the cells using methods well known by those of skill inthe art. Expression of mRNA encoding PRTS and other genes of interestcan be analyzed by northern analysis or microarray techniques.

XIV. Production of PRTS Specific Antibodies

PRTS substantially purified using polyacrylamide gel electrophoresis(PAGE; see, e.g., Harrington, M. G. (1990) Methods Enzymol.182:488-495), or other purification techniques, is used to immnunizerabbits and to produce antibodies using standard protocols.

Alternatively, the PRTS amino acid sequence is analyzed using LASERGENEsoftware (DNASTAR) to determine regions of high immunogenicity, and acorresponding oligopeptide is synthesized and used to raise antibodiesby means kmown to those of skill in the art. Methods for selection ofappropriate epitopes, such as those near the C-terminus or inhydrophilic regions are well described in the art. (See, e.g., Ausubel,1995, supra, ch. 11.)

Typically, oligopeptides of about 15 residues in length are synthesizedusing an ABI 431A peptide synthesizer (Applied Biosystems) using FMOCchemistry and coupled to OM (Sigma-Aldrich, St. Louis Mo.) by reactionwith N-maleimidobenzoyl-N-hydroxysuccinimide ester (MBS) to increaseimmunogenicity. (See, e.g., Ausubel, 1995, supra.) Rabbits are immunizedwith the oligopeptide-KLH complex in complete Freund's adjuvantResulting antisera are tested for antipeptide and anti-PRTS activity by,for example, binding the peptide or PRTS to a substrate, blocking with1% BSA, reacting with rabbit antisera, washing, and reacting withradio-iodinated goat anti-rabbit IgG.

XV. Purification of Naturally Occurring PRTS Using Specific Antibodies

Naturally occurring or recombinant PRTS is substantially purified byimmuno-affinity chromatography using antibodies specific for PRTS. Animmunoaffinity column is constructed by covalently coupling anti-PRTSantibody to an activated chromatographic resin, such as CNBr-activatedSEPHAROSE (Amersham Pharmacia Biotech). After the coupling, the resin isblocked and washed according to the manufacturer's instructions.

Media containing PRTS are passed over the immunoaffinity column, and thecolumn is washed under conditions that allow the preferential absorbanceof PRTS (e.g., high ionic strength buffers in the presence ofdetergent). The column is eluted under conditions that disruptantibody/PRTS binding (e.g., a buffer of pH 2 to pH 3, or a highconcentration of a chaotrope, such as urea or thiocyanate ion), and PRTSis collected.

XVI. Identification of Molecules Which Interact with PRTS

PRTS, or biologically active fragments thereof, are labeled with ¹²⁵IBolton-Hunter reagent. (See, e.g., Bolton A. E. and W. M. Hunter (1973)Biochem. J. 133:529-539.) Candidate molecules previously arrayed in thewells of a multi-well plate are incubated with the labeled PRTS, washed,and any wells with labeled PRTS complex are assayed. Data obtained usingdifferent concentrations of PRTS are used to calculate values for thenumber, affinity, and association of PRTS with the candidate molecules.

Alternatively, molecules interacting with PRTS are analyzed using theyeast two-hybrid system as described in Fields, S. and O. Song (1989)Nature 340:245-246, or using commercially available kits based on thetwo-hybrid system, such as the MATCHMAKER system (Clontech).

PRTS may also be used in the PATHCALLING process (CuraGen Corp., NewHaven Conn.) which employs the yeast two-hybrid system in ahigh-throughput manner to determine all interactions between theproteins encoded by two large libraries of genes (Nandabalan, K. et al.(2000) U.S. Pat. No. 6,057,101).

XVII. Demonstration of PRTS Activity

Protease activity is measured by the hydrolysis of appropriate syntheticpeptide substrates conjugated with various chromogenic molecules inwhich the degree of hydrolysis is quantified by spectrophotometric (orfluorometric) absorption of the released chromophore (Beynon, R. J. andJ. S. Bond (1994) Proteolytic Enzymes: A Practical Approach, OxfordUniversity Press, New York N.Y., pp. 25-55). Peptide substrates aredesigned according to the category of protease activity as endopeptidase(serine, cysteine, aspartic proteases, or metalloproteases),aminopeptidase (leucine aminopeptidase), or carboxypeptidase(carboxeypeptidases A and B, procollagen C-proteinase). Commonly usedchromogens are 2-naphthylamine, 4-nitroaniline, and furylacrylic acid.Assays are performed at ambient temperature and contain an aliquot ofthe enzyme and the appropriate substrate in a suitable buffer. Reactionsare carried out in an optical cuvette, and the increase/decrease inabsorbance of the chromogen released during hydrolysis of the peptidesubstrate is measured. The change in absorbance is proportional to theenzyme activity in the assay.

An alternate assay for ubiquitin hydrolase activity measures thehydrolysis of a ubiquitin precursor. The assay is performed at ambienttemperature and contains an aliquot of PRTS and the appropriatesubstrate in a suitable buffer. Chemically synthesized humanubiquitin-valine may be used as substrate. Cleavage of the C-terminalvaline residue from the substrate is monitored by capillaryelectrophoresis (Franklin, K. et al. (1997) Anal. Biochem. 247:305-309).

In the alternative, an assay for protease activity takes advantage offluorescence resonance energy transfer (FRET) that occurs when one donorand one acceptor fluorophore with an appropriate spectral overlap are inclose proximity. A flexible peptide linker containing a cleavage sitespecific for PRTS is fused between a red-shifted variant (RSGFP4) and ablue variant (BFP5) of Green Fluorescent Protein. This fusion proteinhas spectral properties that suggest energy transfer is occurring fromBFP5 to RSGFP4. When the fusion protein is incubated with PRTS, thesubstrate is cleaved, and the two fluorescent proteins dissociate. Thisis accompanied by a marked decrease in energy transfer which isquantified by comparing the emission spectra before and after theaddition of PRTS (Mitra, R. D. et al. (1996) Gene 173:13-17). This assaycan also be performed in living cells. In this case the fluorescentsubstrate protein is expressed constructively in cells and PRTS isintroduced on an inducible vector so that FRET can be monitored in thepresence and absence of PRTS (Sagot, I. et al. (1999) FEBS Lett.447:53-57).

XVIII. Identification of PRTS Substrates

Phage display libraries can be used to identify optimal substratesequences for PRTS. A random hexamer followed by a linker and a knownantibody epitope is cloned as an N-terminal extension of gene III in afilamentous phage library. Gene III codes for a coat protein, and theepitope will be displayed on the surface of each phage particle. Thelibrary is incubated with PRTS under proteolytic conditions so that theepitope will be removed if the hexamer codes for a PRTS cleavage site.An antibody that recognizes the epitope is added along with immobilizedprotein A. Uncleaved phage, which still bear the epitope, are removed bycentrifligation. Phage in the supernatant are then amplified and undergoseveral more rounds of screening. Individual phage clones are thenisolated and sequenced. Reaction kinetics for these peptide substratescan be studied using an assay in Example XVII, and an optimal cleavagesequence can be derived (Ke, S. H. et al. (1997) J. Biol. Chem.272:16603-16609).

To screen for in vivo PRTS substrates, this method can be expanded toscreen a cDNA expression library displayed on the surface of phageparticles (T7SELECT 10-3 Phage display vector, Novagen, Madison Wis.) oryeast cells (pYD1 yeast display vector kit, Invitrogen, CarlsbadCalif.). In this case, entire cDNAs are fused between Gene III and theappropriate epitope.

XIX. Identification of PRTS Inhibitors

Compounds to be tested are arrayed in the wells of a multi-well plate invarying concentrations along with an appropriate buffer and substrate,as described in the assays in Example XVII. PRTS activity is measuredfor each well and the ability of each compound to inhibit PRTS activitycan be determined, as well as the dose-response kinetics. This assaycould also be used to identify molecules which enhance PRTS activity.

In the alternative, phage display libraries can be used to screen forpeptide PRTS inhibitors. Candidates are found among peptides which bindtightly to a protease. In this case, multi-well plate wells are coatedwith PRTS and incubated with a random peptide phage display library or acyclic peptide library (Koivunen, E. et al. (1999) Nat. Biotechnol.17:768-774). Unbound phage are washed away and selected phage amplifiedand rescreened for several more rounds. Candidates are tested for PRTSinhibitory activity using an assay described in Example XVI.

Various modifications and variations of the described methods andsystems of the invention will be apparent to those skilled in the artwithout departing from the scope and spirit of the invention. Althoughthe invention has been described in connection with certain embodiments,it should be understood that the invention as claimed should not beunduly limited to such specific embodiments. Indeed, variousmodifications of the described modes for carrying out the inventionwhich are obvious to those skilled in molecular biology or relatedfields are intended to be within the scope of the following claims.

TABLE 1 Incyte Poly- Incyte Incyte Polypeptide Polypeptide nucleotidePolynucleotide Project ID SEQ ID NO: ID SEQ ID NO: ID  6930294 1 6930294CD1 18  6930294CB1  7473018 2  7473018CD1 19  7473018CB1 7479221 3  7479221CD1 20  7479221CB1  2923874 4  2923874CD1 21 2923874CB1 55122335 5 55122335CD1 22 55122335CB1  7473550 6  7473550CD123  7473550CB1  7478108 7  7478108CD1 24  7478108CB1  7482021 8 7482021CD1 25  7482021CB1  7482145 9  7482145CD1 26  7482145CB155022586 10 55022586CD1 27 55022586CB1  3238072 11  3238072CD1 28 3238072CB1  7482034 12  7482034CD1 29  7482034CB1  7474351 13 7474351CD1 30  7474351CB1  2232483 14  2232483CD1 31  2232483CB1 7481712 15  7481712CD1 32  7481712CB1  8213480 16  8213480CD1 33 8213480CB1  7478405 17  7478405CD1 34  7478405CB1

TABLE 2 Incyte Polypeptide Polypeptide GenBank ID Probability SEQ ID NO:ID NO: score GenBank Homolog 1  6930294CD1 g190418 4.50E−169 [Homosapiens] preprocathepsin L precursor (Joseph, L. J. et al. (1988) J.Clin. Invest. 81: 1621-1629) 2  7473018CD1 g5669607 2.20E−25 [Equuscaballus] caspase-1 Wardlow, S. et al. (1999) Nucleotide sequence ofequine caspase-1 cDNA. DNA Seq. 10: 133-137. 3  7479221CD1 g65731633.80E−298 [Rattus norvegicus] ubiquitin specific processing proteaseLin, H. et al. (2000) Divergent N-terminal sequences target an inducibletestis deubiquitinating enzyme to distinct subcellular structures. Mol.Cell Biol. 20: 6568-6578. 4  2923874CD1 g306706 5.20E−207 [Homo sapiens]dipeptidyl aminopeptidase like protein (Yokotani, N. et al. (1993) Hum.Mol. Genet. 2: 1037-1039) 5 55122335CD1 g10800858 0 [fl] [Homo sapiens]aminopeptidase B 6  7473550CD1 g2981641 6.40E−201 [Xenopus laevis]ovochymase/ovotryptase polyprotease (Lindsay, L. L. et al. (1999) Proc.Natl. Acad. Sci. U.S.A. 96: 11253-11258) 7  7478108CD1 g544755 3.00E−166[Oryctolagus cuniculus] aminopeptidase N, APN {type II membrane protein}(Santos, A. N. et al. (2000) Cell. Immunol. 201: 22-32) 8  7482021CD1g6573165 5.60E−210 [Rattus norvegicus] testis ubiquitin specificprocessing protease Lin, H. et al. (2000) Divergent N-terminal sequencestarget an inducible testis deubiquitinating enzyme to distinctsubcellular structures. Mol. Cell Biol. 20: 6568-6578. 9  7482145CD1g6683668 1.70E−114 [Carassius auratus] alpha 4 subunit of 20S proteasome(Tokumoto, M. et al. (2000) Eur. J. Biochem. 267: 97-103) 10 55022586CD1g14279329 0 [fl] [Homo sapiens] ubiquitin specific protease 11 3238072CD1 g5410230 5.50E−56 [Homo sapiens] ubiquitin-specific protease3 (Sloper-Mould, K. E. et al. (1999) J. Biol. Chem. 274: 26878-26884) 12 7482034CD1 g4545092 3.80E−60 [Sus scrofa] proteasome subunit LMP7(Chun, T. et al. (1999) Immunogenetics 49: 72-77) 13  7474351CD1g4512604 5.00E−48 [Canis sp.] mastin precursor (Rice, K. D. et al.(1998) Curr. Pharm. Des. 4: 381-396) 14  2232483CD1 g6465985 1.40E−229[[Homo sapiens] quiescent cell proline dipeptidase (Underwood, R. (1999)J. Biol. Chem. 274: 34053-34058) 15  7481712CD1 g13528975 1.00E−122 [fl][Homo sapiens] (BC005279) carboxypeptidase A1 (pancreatic) 16 8213480CD1 g13157560 0 [3′ incom] [Homo sapiens] dJ964F7.1 (noveldisintegrin and reprolysin metalloproteinase family protein) 17 7478405CD1 g5923786 9.10E−164 [Homo sapiens] zinc metalloproteaseADAMTS6 (Hurskainen, T. L. et al. (1999) J. Biol. Chem. 274:25555-25563)

TABLE 3 SEQ Incyte Amino Potential Potential Analytical ID PolypeptideAcid Phosphorylation Glycosylation Signature Sequences, Methods and NO:ID Residues Sites Sites Domains and Motifs Databases 1  6930294CD1 333S160 S210 T155 N221 Papain family cysteine protease HMMER_PFAM T84 Y112Peptidase_C1: A114-T332 Eukaryotic thiol protease active siteBLIMPS_BLOCKS BL00139: Q132-F141, N175-M183, D275- S284, Y295-Y311PAPAIN CYSTEINE PROTEASE BLIMPS_PRINTS PR00705: Q132-L147, H276-E286,Y295-R301 EUKARYOTIC THIOL PROTEASES CYSTEINE BLAST_DOMODM00081|P07711|19-332: L19-V333 DM00081|P25975|20-333: D22-V333DM00081|P06797|19-332: F21-V333 DM00081|P15242|20-332: T20-V333 PROTEASEPRECURSOR SIGNAL CYSTEINE BLAST_PRODOM PROTEINASE HYDROLASE THIOLZYMOGEN CATHEPSIN GLYCOPROTEIN PD000158: S117-S218, C169-P331 PD000247:K31-E113 Eukaryotic thiol (cysteine) protease MOTIFS active sites:Thiol_Protease_Cys: Q132-A143 Thiol_Protease_His: L274-S284 Eukaryoticthiol (cysteine) protease PROFILESCAN active sitesthiol_protease_cys.prf: E113-E163 thiol_protease_his.prf: Q257-G307signal_peptide: M1-T20 HMMER signal_cleavage: M1-A17 SPSCAN 2 7473018CD1 90 S36 T49 T62 N47 CASPASE RECRUITMENT DOMAIN CARD:HMMER_PFAM INTERLEUKIN-1 BETA CONVERTING ENZYME BLAST_DOMO FAMILYHISTIDINE DM07463|P29466|1-122: M1-L89 DM07463|P29452|1-121: M1-L89signal_cleavage: M1-S36 SPSCAN 3  7479221CD1 605 S14 S142 S152 N548 N574Ubiquitin carboxyl-terminal hydrolase HMMER_PFAM S158 S190 S207 family 2signatures S329 S335 S382 UCH-1: A267-R298 S42 S490 S506 UCH-2:N537-L598 S70 T12 T137 Ubiquitin carboxyl-terminal hydrolaseBLIMPS_BLOCKS T175 T227 T235 family 2 signature T239 T377 T424 BL00972:G268-L285, Y353-L362, I411- T433 T454 T463 C425, V540-S564, T567-T588T512 T572 T7 T99 UBIQUITIN CARBOXYL-TERMINAL BLAST_DOMO Y17 Y22HYDROLASES FAMILY 2 DM00659|P40818|782-1103: L272-L594DM00659|P35123|139-432: L272-I445 DM00659|P35125|220-508: L272-L455DM00659|P32571|566-873: N271-F531 PROTEASE UBIQUITIN HYDROLASE ENZYMEBLAST_PRODOM UBIQUITINSPECIFIC CARBOXYLTERMINAL DEUBIQUITINATINGTHIOLESTERASE PROCESSING CONJUGATION PD000590: M258-S432 PD017412:F435-E534 Ubiquitin carboxyl-terminal hydrolase MOTIFS family 2signatures Uch_2_1: G268-Q283 Uch_2_2: Y541-Y558 4  2923874CD1 743 S157S163 S260 N204 N289 Dipeptidyl peptidase IV active site HMMER_PFAM S304S355 S393 N58 N66 signature S589 S593 S635 N695 N707 DPPIV_N_term:M1-D525 S643 S709 T238 Prolyl oligopeptidase family HMMER_PFAM T294 T361T382 Peptidase_S9: F527-I603 T423 T524 T71 Prolyl endopeptidase familyBLIMPS_BLOCKS Y508 BL00708B: D573-I603 Dipeptidyl peptidase IVBLIMPS_PFAM PF00930: H77-Y98, R159-P209, Y221-Y247, E265-E297,L365-I375, E420-N465, P499- I536, D537-K579, F615-P642, N665-L685 PROLYLENDOPEPTIDASE FAMILY SERINE BLAST_DOMO DM02461|P42659|335-862: P222-E743DM02461|P27487|192-765: E167-C727 DM02461|I38593|190-759: I169-C727DM02461|P33894|340-930: I169-V694, Y221- H715 DPP IV HYDROLASE PROTEASESERINE BLAST_PRODOM PEPTIDASE DIPEPTIDASE TRANSMEMBRANE GLYCOPROTEINPD003086: Y20-P493, S275-T524 PD003048: I603-C727 5 55122335CD1 650 S208S318 S359 Peptidase family M1 HMMER_PFAM S496 T141 T368 Peptidase_M1:R32-G417 T374 T386 T408 MEMBRANE ALANYL DIPEPTIDASE FAMILY BLIMPS_PRINTST412 SIGNATURE PR00756: R176-Y191, F220-I235, F295-L305, V322-T337,W341-Y353 Neutral Zn metalloprotease, Zn-binding BLIMPS_BLOCKS regionBL00142: V322-F332 do HYDROLASE; LEUKOTRIENE; A-4; ZINC; BLAST_DOMODM08707|P19602|7-609: H38-H634 DM08707|Q10740|58-670: W152-H634, A26-S82 do ZINC; AMINOPEPTIDASE; BLAST_DOMO METALLOPEPTIDASE; NEUTRAL;DM00700|I55441|163-916: A159-P489 DM00700|S47274|1-784: G160-P489AMINOPEPTIDASE B EC 3.4.11.6 ARGINYL BLAST_PRODOM ARGININE CYTOSOL IVAPB HYDROLASE ZINC METALLOPROTEASE PD143187: A2-F165 HYDROLASE ZINCMETALLOPROTEASE BLAST_PRODOM LEUKOTRIENE A4 LTA4 A4 MULTIFUNCTIONALENZYME BIOSYNTHESIS PD008823: Y533-Q643 AMINOPEPTIDASE HYDROLASEBLAST_PRODOM METALLOPROTEASE ZINC N GLYCOPROTEIN TRANSMEMBRANESIGNALANCHOR MEMBRANE PD001134: R248-D518 Neutral Zn metalloprotease,Zn-binding MOTIFS region Zinc_Protease: V322-W331 6  7473550CD1 932 S319S326 S353 N324 N424 Trypsin family active site HMMER_PFAM S387 S394 S426N500 N52 trypsin: I47-I291, I568-I809 S49 S565 S665 N706 N99 CUB domainHMMER_PFAM S708 S840 S906 CUB: S310-V400, C412-F521 S91 S93 T103 Serineproteases, trypsin family active BLIMPS_BLOCKS T126 T297 T337 siteBL00134: C593-C609, D759-G782, T454 T545 T744 P796-I809 T853 T910 Y735Kringle domain proteins BLIMPS_BLOCKS BL00021B: C72-L89 CHYMOTRYPSINSERINE PROTEASE ACTIVE BLIMPS_PRINTS SITE PR00722: G594-C609, S653-L667TRYPSIN BLAST_DOMO DM00018|P23578|42-289: R567-I813, R46- P268DM00018|A57014|45-284: I568-I813, N52- P268 DM00018|P48038|39-286:R567-I813, P259-P268 DM00018|P03952|392-624: G570- K812, N52-Q293PROTEASE SERINE PRECURSOR SIGNAL BLAST_PRODOM HYDROLASE ZYMOGENGLYCOPROTEIN FAMILY MULTIGENE FACTOR PD000046: G589-I809, W50-I291Serine proteases, trypsin family, active MOTIFS sites Trypsin_His:V83-C88, L604-C609 Trypsin_Ser: D231-V242 Serine proteases, trypsinfamily, active PROFILESCAN sites trypsin_his.prf: L64-Q115, L585- T634trypsin_ser.prf: L216-G264, I743-Q792 signal_peptide: M1-G22 HMMERsignal_cleavage: M1-G22 SPSCAN 7  7478108CD1 990 S200 S237 S282 N132N168 Peptidase family M1 HMMER_PFAM S353 S442 S536 N261 N288Peptidase_M1: L98-G506 S54 S631 S641 N319 N338 MEMBRANE ALANYLDIPEPTIDASE FAMILY BLIMPS_PRINTS S643 S74 S835 N346 N360 SIGNATUREPR00756: W431-Y443, R245-F260, S917 S979 T128 N582 N600 F297-I312,F376-L386, V412-T427 T134 T141 T321 N607 N619 Neutral Znmetalloprotease, Zn-binding BLIMPS_BLOCKS T403 T562 T605 N653 N848region BL00142: V412-F422 T69 T706 T850 N887 do ZINC; AMINOPEPTIDASE;BLAST_DOMO T885 T967 T990 METALLOPEPTIDASE; NEUTRAL;DM00700|P15541|67-903: W93-I932 DM00700|P15145|66-901: W93-I929DM00700|P15684|70-903: W93-I929 DM00700|P15144|70-904: W93-I932AMINOPEPTIDASE HYDROLASE BLAST_PRODOM METALLOPROTEASE ZINC GLYCOPROTEINTRANSMEMBRANE SIGNALANCHOR PD001134: Q95-T585 PD002091: V587-Y874Neutral Zn metalloprotease, Zn-binding MOTIFS region Zinc_Protease:V412-W421 signal_peptide: M1-A31 HMMER signal_cleavage: M1-A34 SPSCANtransmembrane domain: A16-Y37 HMMER 8  7482021CD1 396 S120 S126 S173N339 N365 Ubiquitin carboxyl-terminal hydrolase HMMER_PFAM S281 S297T168 family 2 UCH-1: A58-R89 T215 T224 T245 UCH-2: N328-L389 T254 T303T363 Ubiquitin carboxyl-terminal hydrolase BLIMPS_BLOCKS T8 family 2BL00972: G59-L76, Y144-L153, I202-C216, V331-S355, T358-T379 UBIQUITINCARBOXYL-TERMINAL HYDRO- BLAST_DOMO LASE FAMILY 2DM00659|P40818|782-1103: L63- L385 DM00659|P35123|139-432: L63-I236DM00659|P35125|220-508: L63-L246 DM00659|P32571|566-873: N62-F322PROTEASE UBIQUITINSPECIFIC HYDROLASE BLAST_PRODOM ENZYME C-TERMINALDEUBIQUITINATING THIOLESTERASE PROCESSING CONJUGATION PD000590: S51-S223PD017412: F226-E325 Ubiquitin carboxyl-terminal hydrolase MOTIFS family2 Uch_2_1: G59-Q74 Uch_2_2: Y332-Y349 signal_cleavage: M1-N46 SPSCAN 9 7482145CD1 250 S166 S185 S223 N177 Proteasome A-type and B-typeHMMER_PFAM S246 S3 S32 S95 proteasome: T33-T179 T115 T169 T232Proteasome A-type subunits signature BLIMPS_BLOCKS T60 T99 Y178 BL00388:Y5-K50, K63-V104, Q118-D139, L146 -K176 Proteasome A-type and B-typePF00227: BLIMPS_PFAM F12-Y23 Proteasome A-type subunits signaturePROFILESCAN proteasome.prf: M1-V47 PROTEASOME A-TYPE SUBUNITS BLAST_DOMODM00341|P48004|1-226: Y5-S223 DM00341|S23451|3-222: S3-M221DM00341|P22769|3-222: S3-M221 DM00341|P34120|4-220: S3-L219 PROTEASOMEHYDROLASE PROTEASE BLAST_PRODOM SUBUNIT MULTICATALYTIC COMPLEXENDOPEPTIDASE MACROPAIN COMPONENT PROTEIN PD000280: S32-K191 ProteasomeA-type subunits signature MOTIFS Proteasome_A: Y5-A27 10 55022586CD11045 S47, S76, S109, N282, PROBABLE UBIQUITIN CARBOXYLTERMINALBLAST-PRODOM S113, T130, N310, HYDROLASE K02C4.3 EC 3.1.2.15 T134, S137,N373, THIOLESTERASE UBIQUITINSPECIFIC S205, T207, N639, PROCESSINGPROTEASE DEUBIQUITINATING S228, S248, N711, N822 ENZYME HYPOTHETICALPROTEIN CONJU- T260, S279, GATION THIOL: PD138085: F540-S720, Y316-S720S347, S368, Ubiquitin carboxyl-terminal hydrolase BLIMPS-BLOCKS S453,S479, family 2 proteins: BL00972: G163-L180, T484, T489, E251-T260,P583-N607, R610-R631 S494, S503, Ubiquitin carboxyl-terminal hydrolaseHMMER-PFAM S504, S517, family: UCH-1: V162-Y193, UCH-2: R580- S520,T532, N649 T534, S550, Ubiquitin carboxyl-terminal hydrolase MOTIFSS620, S624, family: Uch_2_2: Y584-Y601 S625, T662, S668, S700, S713,T719, T753, S760, S787, T813, S824, S867, T872, Y888, S930, T934, S939,S964, T1016, S1021, T1043 11  3238072CD1 622 S142 S166 S221 N243 N424Ubiquitin carboxyl-terminal hydrolase HMMER_PFAM S238 S245 S250 N566family 2 S285 S304 S363 UCH-1: T187-L218 S455 S460 S493 UCH-2: E528-Q590S523 S56 S574 Ubiquitin carboxyl-terminal hydrolase BLIMPS_BLOCKS S611S96 T149 family 2 T165 T173 T354 BL00972: G188-L205, Y329-L338; V375-T355 T368 T438 C389, Y532-N556, G559-K580 T50 T589 T88 UBIQUITINCARBOXYL-TERMINAL BLAST_DOMO HYDROLASES FAMILY 2DM00659|P40818|782-1103: L291-G542, V421-L586, L192-F215DM00659|Q09738|149-388: K306-V421, V421-G542, N191-N217DM00659|S57874|537-787: H288-T426, L192-N217 PROTEASE UBIQUITINSPECIFICHYDROLASE BLAST_PRODOM ENZYME CARBOXYLTERMINAL DEUBIQUITINATINGTHIOLESTERASE PROCESSING CONJUGATION PD000590: N281-T398, A183-T223Ubiquitin carboxyl-terminal hydrolase MOTIFS family 2 Uch_2_1: G188-Q203Uch_2_2: Y532-Y550 12  7482034CD1 345 S125 S201 S242 Proteasome A-typeand B-type subunit HMMER_PFAM S276 S282 S37 proteasome: T96-R238 T142T258 T332 Proteasome B-type subunit BLIMPS_BLOCKS Y207 BL00854:A99-A144, F206-D234, A257-G266 PROTEASOME COMPONENT SIGNATUREBLIMPS_PRINTS PR00141: H259-L270, F102-G117, G223-D234, D234-E245PROTEASOME B-TYPE SUBUNITS BLAST_DOMO DM00618|P28062|46-260: G77-V281DM00618|P30656|48-264: P80-W278 DM00618|P28072|5-222: P80-E279DM00618|I49120|1-185: L98-E279 PROTEASOME HYDROLASE PROTEASEBLAST_PRODOM SUBUNIT MULTICATALYTIC COMPLEX ENDOPEPTIDASE MACROPAINCOMPONENT PD000280: T95-E245 Proteasome_B: L98-D145 MOTIFSsignal_peptide: M1-A30 HMMER signal_cleavage: M1-A30 SPSCAN 13 7474351CD1 948 S179 S19 S194 N159 N247 Trypsin family serine proteaseactive HMMER_PFAM S287 S310 S514 N325 N335 site trypsin: A218-I406,V419-Q496, S522 S613 S648 N372 N630 L636-R761 S687 S751 S923 Trypsinfamily serine protease active PROFILESCAN T150 T315 T327 sitetrypsin_ser.prf: R705-G748 T337 T578 T653 CHYMOTRYPSIN SERINE PROTEASEBLIMPS_PRINTS T718 T722 T738 PR00722C: R720-V732 T760 T919 T95 TRYPSINBLAST_DOMO Y467 DM00018|P19236|20-262: E212-Q408, L636- W766DM00018|P21845|31-271: F219-Q408, L643- R761 DM00018|Q02844|29-268:R220-I406, P629-R761 DM00018|P15157|31-270: E215-I406, P629- R761PROTEASE SERINE PRECURSOR SIGNAL BLAST_PRODOM HYDROLASE ZYMOGENGLYCOPROTEIN FAMILY MULTIGENE PD000046: D232-I406 Kringle domainproteins BLIMPS_BLOCKS BL00021: V276-G297, G365-I406 14  2232483CD1 444S291 S305 S402 N289 N330 Prolyl aminopeptidase family BLIMPS_PRINTS S409S60 T121 N337 N380 PR00793C: V158-R172 T212 T314 T75 N50 N86 Prolyloligopeptidase family BLIMPS_PRINTS BL00862D: G160-A180 Prolylendopeptidase family BLIMPS_BLOCKS BL00708B: D137-L167 alpha/betahydrolase fold HMMER_PFAM abhydrolase: A100-A334 do LYSOSOMAL; PRO-X;CARBOXYPEPTIDASE; BLAST_DOMO DM03192|P42785|3-487: A4-T206, V213- F342,D355-E417 DM03192|P34676|1-498: F31-V189, Y210- I377, S354-K426DM03192|P34610|31-480: R39-F342, Q324- R414 DM03192|P34528|84-584:F36-A191, C326- K415, S291-T339 PROTEIN CARBOXYPEPTIDASE LYSOSOMALBLAST_PRODOM PROX SIMILAR HUMAN CHROMOSOME III F23B2.12 PD149833:L243-N337, S360-L416 signal_peptide: M1-A21 HMMER Leucine_Zipper:L128-L149 MOTIFS signal_cleavage: M1-A21 SPSCAN 15  7481712CD1 514 S202S225 S336 N115 N249 Zinc carboxypeptidase Zn binding region HMMER_PFAMS377 T219 T316 N334 N359 Zn_carbOpept: Y217-E497 T494 T504 Y436 N93 Zinccarboxypeptidases, Zn binding BLIMPS_BLOCKS region BL00132: Y217-F257,P265-W278, Y295-K335, P339-K353, P365-H391, N393- L414, T450-G467CARBOXYPEPTIDASE A METALLOPROTEASE BLIMPS_PRINTS FAMILY PR00765:I243-L255, P265-I279, G345-K353, I398-Y411 Zn carboxypeptidases,Zn-binding region PROFILESCAN signatures carboxypept_zn_2.prf: E380-L435ZINC CARBOXYPEPTIDASES, ZINC-BINDING BLAST_DOMO REGION 1DM00683|P15085|112-418: R207-P513 DM00683|P48052|111-416: E206-P513DM00683|A56171|111-416: E206-P513 DM00683|P19222|111-416: E206-P513CARBOXYPEPTIDASE PRECURSOR SIGNAL BLAST_PRODOM HYDROLASE ZINC ZYMOGENPROTEIN D B GP180CARBOXYPEPTIDASE PD001916: Y217- Y411 Zinccarboxypeptidases, Zn binding MOTIFS region signatures Carboxypept_Zn_1:P265-T287 Carboxypept_Zn_2: H401-Y411 16  8213480CD1 787 S162 S389 S450N109 N145 Reprolysin (M12B) family zinc HMMER_PFAM S547 S55 S61 N231N276 metalloprotease Reprolysin: K210-P409 S761 T174 T208 N448Reprolysin family propeptide HMMER_PFAM T258 T264 T302 Pep_M12B_propep:E80-Q198 T605 Y243 Neutral Zn metallopeptidase Zn-binding BLIMPS_BLOCKSregion BL00142: T342-G352 Neutral Zn metallopeptidase Zn-bindingPROFILESCAN region zinc_protease.prf: E323-A376 Disintegrin signatureHMMER_PFAM disintegrin: E426-L501 Disintegrins signature PROFILESCANdisintegrins.prf: G352-D498 Disintegrin signature BL00427: C443-P497BLIMPS_BLOCKS DISINTEGRIN SIGNATURE BLIMPS_PRINTS PR00289: C457-R476,E486-D498 do ZINC; NEUTRAL METALLOPEPTIDASE; BLAST_DOMO ATROLYSIN;DM00368|S60257|204-414: R202-D410 DM00368|Q05910|189-395: R206-D410DM00368|P28891|1-202: E204-P409 do ZINC; REGULATED; EPIDIDYMAL;BLAST_DOMO NEUTRAL; DM00591|S60257|492-628: F487-G608 METALLOPROTEASEPRECURSOR HYDROLASE BLAST_PRODOM SIGNAL ZINC VENOM CELL TRANSMEMBRANEADHESION PD000791: R209-P409 PD000935: L70-M169 CELL ADHESION PLATELETBLOOD BLAST_PRODOM COAGULATION VENOM DISINTEGRIN METALLOPROTEASEPRECURSOR SIGNAL PD000664: E426-Y500 TRANSMEMBRANE METALLOPROTEASEBLAST_PRODOM SIGNAL PRECURSOR GLYCOPROTEIN CELL FERTILIN BETA ADHESIONPD001269: D503-L572 signal_peptide: M1-G27 HMMER Neutral Znmetallopeptidase Zn binding MOTIFS region Zinc_Protease: T342-L351signal_cleavage: M1-G27 SPSCAN 17  7478405CD1 1082 S1021 S1060 S220 N151N190 Reprolysin family propeptide HMMER_PFAM S279 S289 S396 N313 N745Pep_M12B_propep: E111-R222 S631 S698 S795 N838 N909 Reprolysin (M12B)family zinc HMMER_PFAM S89 S914 S953 metalloprotease zinc binding regionT1025 T135 T171 Reprolysin: V295-P498 T206 T390 T421 Neutral Znmetalloprotease, Zn-binding BLIMPS_BLOCKS T65 T674 T747 region T817 T871Y270 BL00142: T433-G443 do ZINC; METALLOPEPTIDASE; NEUTRAL; BLAST_DOMOATROLYSIN; DM00368|Q05910|189-395: V295-P498 DM00368|S48169|140-343:V295-P498 DM00368|P34179|1-202: V295-P498 DM00368|P15167|190-392:V295-P498 METALLOPROTEASE PRECURSOR HYDROLASE BLAST_PRODOM SIGNAL ZINCVENOM CELL PROTEIN TRANSMEMBRANE ADHESION PD000791: V295- P498 PROTEINPROCOLLAGEN THROMBOSPONDIN BLAST_PRODOM MOTIFS NPROTEINASE A DISINTEGRINMETALLOPROTEASE WITH ADAMTS1 PD011654: V676-C748 PD013511: K509-V578Thrombospondin type 1 domain HMMER_PFAM tsp_1: S593-C643, R873-C931,G938-C991, P993-C1048 signal_cleavage: M1-S16 SPSCAN

TABLE 4 Polynucleotide Incyte Sequence Selected SEQ ID NO:Polynucleotide ID Length Fragment(s) Sequence Fragments 5′ Position 3′Position 18  6930294CB1 1187 1091-1187 6917460H1 (PLACFER06) 217 927GBI: g7939149_000003.2.edit 814 1187 5118206F6 (SMCBUNT01) 1 506 19 7473018CB1 461 g1365166 1 461 GNN.g7651935_000011_002 21 293 20 7479221CB1 1884  591-773 6981403H1 (BRAIFER05) 1144 1758 6618712H1(BRAITDR02) 375 1024 7269080H1 (OVARDIJ01) 977 1598 1241675R6(LUNGNOT03) 1311 1884 GBI.g7960351_edit 1 774 21  2923874CB1 2576  1-158 72004319V1 490 1356 71998773V1 1899 2576 7015044F8 (KIDNNOC01) 1591 72004394V1 1276 2171 22 55122335CB1 2000 1792-2000 8268324H1(BLYRTXF01) 1 636 70942077V1 1323 1994 7699471H1 (KIDPTDE01) 578 122671984458V1 1365 2000 71986878V1 1255 1928 55114534J1 677 1275 23 7473550CB1 3522   1-735, FL7473550_g8102345_000 98 3522 2323-2786,001_g2981641 1799-1940,  930-1690, 2913-3522, 2140-2240GNN.g7076703_000017_002 1 312 24  7478108CB1 3277  709-831, 1-179,6926255F8 (PLACFER06) 2274 2745 3244-3277, 1957-2354 6923595F6(PLACFER06) 1750 2603 55142456J1 1 736 55047371J1 666 1574 55047372J11063 1960 6926255H1 (PLACFER06) 2273 2689 5329258F6 (DRGTNON04) 28013277 GBI.g9256180_000003_000004.edit 2708 3268 25  7482021CB1 1254  1-76 g3016366 77 592 7037834H1 (UTRSTMR02) 180 645 1241675R6(LUNGNOT03) 684 1254 6450560H1 (BRAINOC01) 594 1238 GBI.g7960351_edit_11 147 26  7482145CB1 1120  806-854 70197639V1 1 448 GBI.g8516058_edit111 863 70166902V1 661 1120 27 55022586CB1 4577   1-71, 4368-4577,55057844J1 583 1360  986-2809 71764468V1 2926 3719 6920230F6 (PLACFER06)1290 2134 71760331V1 2434 3064 71760332V1 3836 4381 5763279F8(PROSBPT02) 3899 4577 71188036V1 2409 3037 55022577H1 788 147955057841H1 1 738 71764426V1 3058 3795 2725111T6 (OVARTUT05) 3772 43596920230R6 (PLACFER06) 1589 2433 28  3238072CB1 1952  592-644, 71929643V1659 1445 1820-1952 GBI.g10186764_000001.edit 1762 1952 3238072F6(COLAUCT01) 1127 1830 7725175J1 (THYRDIE01) 1 685 71928050V1 781 1460 29 7482034CB1 1092   1-181, 924-1092 GBI.g9756020_000001.edit 1 174GNN.g8217882_012 55 1092 30  7474351CB1 2847   1-290, 500-284760123248D3 901 1116 GBI: g9798436_CDS_1 1 2847 CpG_WDJ300089003.R1 13231489 3532405H1 (KIDNNOT25) 960 1164 31  2232483CB1 1396   1-25 8094675H1(EYERNOA01) 636 1096 71152873V1 858 1396 1628644F6 (COLNPOT01) 1 48260220501D1 459 833 32  7481712CB1 1853   1-873 6810286H1 (SKIRNOR01) 1547 55051982H1 876 1416 GNN: g5306288_002 90 1853 33  8213480CB1 3344  1-1904, 1479739H1 (CORPNOT02) 1592 1837 2575-2624 7174969F8(BRSTTMC01) 610 1253 6831592H1 (SINTNOR01) 1 334 72142924D1 1922 24277663110F6 (UTRSTME01) 573 1021 2786453T6 (BRSTNOT13) 2780 334455113148H1 2325 3142 6958043R8 (BLADNOR01) 196 644 1252335T6 (LUNGFET03)2663 3341 7659180J1 (OVARNOE02) 1716 2302 34  7478405CB1 3389  563-308672420192D1 865 1338 g6702073 1 561 4018316F8 (BRAXNOT01) 2756 321958005173H1 2118 2793 55123782H1 1340 2115 55141002J1 567 1334 55065490J11300 1997 55123882J1 2095 2747 g1550049 3089 3389 4293359F6 (BRABDIR01)336 816

TABLE 5 Polynucleotide Incyte SEQ ID NO: Project ID RepresentativeLibrary 18  6930294CB1 CONFNOT03 20  7479221CB1 LUNGNOT03 21  2923874CB1BRAINOT22 22 55122335CB1 KIDEUNE02 24  7478108CB1 PLACFER06 25 7482021CB1 LUNGNOT03 26  7482145CB1 COLITUT02 27 55022586CB1 PROSTUS2328  3238072CB1 ESOGTUE01 30  7474351CB1 KIDNNOT25 31  2232483CB1BRSTNOT05 32  7481712CB1 SKIRNOR01 33  8213480CB1 UTRSTME01 34 7478405CB1 ENDMUNE01

TABLE 6 Library Vector Library Description BRAINOT22 pINCY Library wasconstructed using RNA isolated from right temporal lobe tissue removedfrom a 45-year-old Black male during a brain lobectomy. Pathology forthe associated tumor tissue indicated dysembryoplastic neuroepithelialtumor of the right temporal lobe. The right temporal region dura wasconsistent with calcifying pseudotumor of the neuraxis. Family historyincluded obesity, benign hypertension, cirrhosis of the liver, obesity,hyperlipidemia, cerebrovascular disease, and type II diabetes. BRSTNOT05PSPORT1 Library was constructed using RNA isolated from breast tissueremoved from a 58- year-old Caucasian female during a unilateralextended simple mastectomy. Pathology for the associated tumor tissueindicated multicentric invasive grade 4 lobular carcinoma. Patienthistory included skin cancer, rheumatic heart disease, osteoarthritis,and tuberculosis. Family history included cerebrovascular andcardiovascular disease, breast and prostate cancer, and type I diabetes.COLITUT02 pINCY Library was constructed using RNA isolated from colontumor tissue of the ileocecal valve removed from a 29-year-old female.Pathology indicated malignant lymphoma, small cell, non-cleaved(Burkitt's lymphoma, B-cell phenotype), forming a polypoid mass in theregion of the ileocecal valve, associated with intussusception andobstruction clinically. The liver and multiple (3 of 12) ileocecalregion lymph nodes were also involved by lymphoma. CONFNOT03 pINCYLibrary was constructed using RNA isolated from mesenteric fat tissueremoved from a 71-year-old Caucasian male during a partial colectomy andpermanent colostomy. Pathology indicated mesenteric fat tissueassociated with diverticulosis and diverticulitis with abscessformation. Approximately 50 diverticula were noted, one of which wasperforated and associated with abscess formation in adjacent mesentericfat. The patient presented with atrialfibrillation. Patient historyincluded viral hepatitis, a hemangioma, and diverticulitis of colon.Family history included extrinsic asthma, atheroscleroticcoronary arterydisease, and myocardial infarction. ENDMUNE01 pINCY This 5′ biasedrandom primed library was constructed using RNA isolated from untreatedumbilical artery endothelial cell tissue removed from a Caucasian male(Clonetics) newborn. ESOGTUE01 pINCY This 5′ biased random primedlibrary was constructed using RNA isolated from esophageal tumor tissueremoved from a 61-year-old Caucasian male during a partialesophagectomy, proximal gastrectomy, pyloromyotomy, and regional lymphnode excision. Pathology indicated an invasive grade 3 adenocarcinoma inthe esophagus, extending distally to involve the gastroesophagealjunction. The tumor extended through the muscularis to involveperiesophageal and perigastric soft tissues. One perigastric and twoperiesophageal lymph nodes were positive for tumor. There were multipleperigastric and periesophageal tumor implants. The patient presentedwith deficiency anemia and myelodysplasia. Patient history includedhyperlipidemia, and tobacco and alcohol abuse in remission. Previoussurgeries included adenotonsillectomy, rhinoplasty, vasectomy, andhemorrhoidectomy. A previous bone marrow aspiration found the marrow tobe hypercellular for age and had a cellularity-to-fat ratio of 95:5. Themarrow was focally densely fibrotic. Granulocytic precursors wereslightly increased with normal maturation. The estimate of blast cellswas greater than 5%. Megakaryocytes were increased and appeared atypicalin clusters. Storage cells and granulomata were absent. Patientmedications included Epoetin, Danocrine, Berocca Plus tablets, Selenium,vitamin B6 phosphate, vitamins E & C, and beta carotene. Family historyincluded alcohol abuse, atherosclerotic coronary artery disease, type IIdiabetes, chronic liver disease, and primary cardiomyopathy in thefather; and benign hypertension and cerebrovascular disease in themother. KIDEUNE02 pINCY This 5′ biased random primed library wasconstructed using RNA isolated from an untreated transformed embryonalcell line (293-EBNA) derived from kidney epithelial tissue (Invitrogen).The cells were transformed with adenovirus 5 DNA. KIDNNOT25 pINCYLibrary was constructed using RNA isolated from kidney tissue removedfrom the left lower kidney pole of a 42-year-old Caucasian female duringnephroureterectomy. Pathology indicated slight hydronephrosis andnephrolithiasis. Patient history included calculus of the kidney.LUNGNOT03 PSPORT1 Library was constructed using RNA isolated from lungtissue of a 79-year-old Caucasian male. Pathology for the associatedtumor tissue indicated grade 4 carcinoma. Patient history included abenign prostate neoplasm and atherosclerosis. PLACFER06 pINCY Thisrandom primed library was constructed using RNA isolated from placentaltissue removed from a Caucasian fetus who died after 16 weeks' gestationfrom fetal demise and hydrocephalus. Patient history included umbilicalcord wrapped around the head (3 times) and the shoulders (1 time).Serology was positive for anti-CMV. Family history included multiplepregnancies and live births, and an abortion. PROSTUS23 pINCY Thissubtracted prostate tumor library was constructed using 10 millionclones from a pooled prostate tumor library that was subjected to 2rounds of subtractive hybridization with 10 million clones from a pooledprostate tissue library. The starting library for subtraction wasconstructed by pooling equal numbers of clones from 4 prostate tumorlibraries using mRNA isolated from prostate tumor removed from Caucasianmales at ages 58 (A), 61 (B), 66 (C), and 68 (D) during prostatectomywith lymph node excision. Pathology indicated adenocarcinoma in alldonors. History included elevated PSA, induration and tobacco abuse indonor A; elevated PSA, induration, prostate hyperplasia, renal failure,osteoarthritis, renal artery stenosis, benign HTN, thrombocytopenia,hyperlipidemia, tobacco/alcohol abuse and hepatitis C (carrier) in donorB; elevated PSA, induration, and tobacco abuse in donor C; and elevatedPSA, induration, hypercholesterolemia, and kidney calculus in donor D.The hybridization probe for subtraction was constructed by pooling equalnumbers of cDNA clones from 3 prostate tissue libraries derived fromprostate tissue, prostate epithelial cells, and fibroblasts fromprostate stroma from 3 different donors. Subtractive hybridizationconditions were based on the methodologies of Swaroop et al., NAR 19(1991): 1954 and Bonaldo, et al. Genome Research 6 (1996): 791.SKIRNOR01 PCDNA2.1 This random primed library was constructed using RNAisolated from skin tissue removed from the breast of a 17-year-oldCaucasian female during bilateral reduction mammoplasty. Patient historyincluded breast hypertrophy. Family history included benignhypertension. UTRSTME01 PCDNA2.1 This 5′ biased random primed librarywas constructed using RNA isolated from uterus tissue removed from a49-year-old Caucasian female during vaginal hysterectomy and bilateralsalpingo-oophorectomy. Pathology for the matched tumor tissue indicatedmultiple (6) intramural leiomyomata. The patient presented withexcessive menstruation, deficiency anemia, and dysmenorrhea. Patienthistory included abdominal pregnancy, headache, and chronic obstructiveasthma. Previous surgeries included hemorrhoidectomy, knee ligamentrepair, and intranasal lesion destruction. Patient medications includedAzmacort, Proventil, Trazadone, Zostrix HP, iron, Premarin, and vitaminC. Family history included alcohol abuse, atherosclerotic coronaryartery disease, upper lobe lung cancer, and carotid endarterectomy inthe father; breast fibroadenosis in the sibling(s); and acute myocardialinfarction, liver cancer, acute leukemia, and breast cancer (central) inthe grandparent(s).

TABLE 7 Parameter Program Description Reference Threshold ABIFACTURA Aprogram that removes vector sequences and Applied Biosystems, FosterCity, CA. masks ambiguous bases in nucleic acid sequences. ABI/ A FastData Finder useful in comparing and Applied Biosystems, Foster City, CA;Mismatch < PARACEL annotating amino acid or nucleic acid sequences.Paracel Inc., Pasadena, CA. 50% FDF ABI A program that assembles nucleicacid sequences. Applied Biosystems, Foster City, CA. AutoAssembler BLASTA Basic Local Alignment Search Tool useful in Altschul, S. F. et al.(1990) J. Mol. Biol. ESTs: sequence similarity search for amino acid and215: 403-410; Altschul, S. F. et al. (1997) Probability nucleic acidsequences. BLAST includes five Nucleic Acids Res. 25: 3389-3402. value =1.0E−8 functions: blastp, blastn, blastx, tblastn, and tblastx. or lessFull Length sequences: Probability value = 1.0E−10 or less FASTA APearson and Lipman algorithm that searches for Pearson, W. R. and D. J.Lipman (1988) Proc. ESTs: fasta E similarity between a query sequenceand a group of Natl. Acad Sci. USA 85: 2444-2448; Pearson, value =sequences of the same type. FASTA comprises as W. R. (1990) MethodsEnzymol. 183: 63-98; 1.06E−6 least five functions: fasta, tfasta, fastx,tfastx, and and Smith, T. F. and M. S. Waterman (1981) Assembledssearch. Adv. Appl. Math. 2: 482-489. ESTs: fasta Identity = 95% fastxscore = 100 or greater or greater and Match length = 200 bases orgreater; fastx E value = 1.0E−8 or less Full Length sequences: BLIMPS ABLocks IMProved Searcher that matches a Henikoff, S. and J. G. Henikoff(1991) Nucleic Probability sequence against those in BLOCKS, PRINTS,Acids Res. 19: 6565-6572; Henikoff, J. G. and value = 1.0E−3 DOMO,PRODOM, and PFAM databases to search S. Henikoff (1996) Methods Enzymol.or less for gene families, sequence homology, and structural 266:88-105; and Attwood, T. K. et al. (1997) J. fingerprint regions. Chem.Inf. Comput. Sci. 37: 417-424. HMMER An algorithm for searching a querysequence against Krogh, A. et al. (1994) J. Mol. Biol. PEAM hits: hiddenMarkov model (HMM)-based databases of 235: 1501-1531; Sonnhammer, E. L.L. et al. Probability protein family consensus sequences, such as PFAM.(1988) Nucleic Acids Res. 26: 320-322; value = 1.0E−3 Durbin, R. et al.(1998) Our World View, in a or less Nutshell, Cambridge Univ. Press, pp.1-350. Signal peptide hits: Score = 0 or greater ProfileScan Analgorithm that searches for structural and sequence Gribskov, M. et al.(1988) CABIOS 4: 61-66; Normalized motifs in protein sequences thatmatch sequence patterns Gribskov, M. et al. (1989) Methods Enzymol.quality score ≧ defined in Prosite. 183: 146-159; Bairoch, A. et al.(1997) GCG-specified Nucleic Acids Res. 25: 217-221. “HIGH” value forthat particular Prosite motif. Generally, score = 1.4-2.1. Phred Abase-calling algorithm that examines automated Ewing, B. et al. (1998)Genome Res. sequencer traces with high sensitivity and probability. 8:175-185; Ewing, B. and P. Green (1998) Genome Res. 8: 186-194. Phrap APhils Revised Assembly Program including SWAT and Smith, T. F. and M. S.Waterman (1981) Adv. Score = 120 or CrossMatch, programs based onefficient implementation Appl. Math. 2: 482-489; Smith, T.F. and M.S.greater; of the Smith-Waterman algorithm, useful in searching Waterman(1981) J. Mol. Biol. 147: 195-197; Match length = sequence homology andassembling DNA sequences. and Green, P., University of Washington, 56 orgreater Seattle, WA. Consed A graphical tool for viewing and editingPhrap assemblies. Gordon, D. et al. (1998) Genome Res. 8: 195-202.SPScan A weight matrix analysis program that scans protein Nielson, H.et al. (1997) Protein Engineering Score = 3.5 or sequences for thepresence of secretory signal peptides. 10: 1-6; Claverie, J.M. and S.Audic (1997) greater CABIOS 12: 431-439. TMAP A program that uses weightmatrices to delineate Persson, B. and P. Argos (1994) J. Mol. Biol.transmembrane segments on protein sequences and 237: 182-192; Persson,B. and P. Argos (1996) determine orientation. Protein Sci. 5: 363-371.TMHMMER A program that uses a hidden Markov model (HMM) to Sonnhammer,E. L. et al. (1998) Proc. Sixth Intl. delineate transmembrane segmentson protein sequences Conf. on Intelligent Systems for Mol. Biol., anddetermine orientation. Glasgow et al., eds., The Am. Assoc. forArtificial Intelligence Press, Menlo Park, CA, pp. 175-182. Motifs Aprogram that searches amino acid sequences for patterns Bairoch, A. etal. (1997) Nucleic Acids that matched those defined in Prosite. Res. 25:217-221; Wisconsin Package Program Manual, version 9, page M51-59,Genetics Computer Group, Madison, WI.

1. An isolated polynucleotide encoding (a) a polypeptide comprising theamino acid sequence depicted in SEQ ID NO: 1, (b) a biologically activefragment of the polypeptide that consists of the amino acid sequencedepicted in SEQ ID NO: 1, wherein the fragment has cysteine proteaseactivity, or (c) a polypeptide that has at least 95% sequence identityto the amino acid sequence depicted in SEQ ID NO: 1 and has cysteineprotease activity.
 2. The isolated polynucleotide of claim 1, whereinthe isolated polynucleotide comprises the nucleic acid sequence depictedin SEQ ID NO.
 18. 3. A recombinant polynucleotide comprising a promotersequence operably linked to the polynucleotide of claim
 1. 4. A celltransformed with the recombinant polynucleotide of claim
 3. 5. A methodof producing a polypeptide encoded by the polynucleotide of claim 1comprising: a) culturing a cell, which has been transformed with arecombinant polynucleotide that comprises a promoter operably linked tothe polynucleotide of claim 1, under conditions suitable for expressionof the polypeptide, and b) recovering the expressed polypeptide.
 6. Themethod of claim 5, wherein the polypeptide comprises the amino acidsequence of SEQ ID NO.
 1. 7. An isolated polynucleotide selected fromthe group consisting of: (a) a polynucleotide comprising the nucleicacid sequence depicted in SEQ ID NO. 18, (b) a polynucleotide comprisinga nucleic acid sequence that has at least 95% sequence identity to thenucleic acid sequence depicted in SEQ ID NO. 18, wherein thepolynucleotide encodes a polypeptide that has cysteine proteaseactivity, c) a polynucleotide complementary to the polynucleotide of(a), (d) a polynucleotide complementary to the polynucleotide of (b),and (e) an RNA equivalent of (a)-(d).